Hello,
Yeah, it's interesting!I have tried and something like "[^ATCGatcg]" is 
useful.I have a large file to deal with so I will search something to choose an 
efficient regular expresson.
Thank you.
Date: Mon, 9 Dec 2013 07:24:46 -0800
From: j...@bx.psu.edu
To: zhus...@msn.cn
CC: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] How to filter the sequences containing not[ATCG] 
character?


  
    
  
  
    Hello,

    

    You are right! I forgot about that. Aren't regular expressions fun?
    And please test it out, if you prefer your method or are just
    curious, I didn't try it that way. There are usually a few ways to
    do the same thing when using a regex.

    

    But, I am glad that this helped a bit and good luck with the query,

    

    Jen

    Galaxy team

    

    On 12/9/13 7:06 AM, 朱师云 wrote:

    
    
      
      Hi,
        

        
        It indeed helps.
        Your regular expression looks brief and  more useful.
        BTW, a start of line (^) between [] and in the first
          location, for example, [^ATCGatcg] means a character not
          [ATCGatcg], which maybe not work in the tool SELECT.
        

        
        Thank you for your help!

          

          
            Date: Mon, 9 Dec 2013 06:34:28 -0800

            From: j...@bx.psu.edu

            To: zhus...@msn.cn; galaxy-user@lists.bx.psu.edu

            Subject: Re: [galaxy-user] How to filter the sequences
            containing not[ATCG] character?

            

            Hello,

            

            If the data was in .fastqsanger format, you could use the
            tool "Manipulate FASTQ", but with .fasta, this is a good
            way.

            

            But watch your regular expression - test it out on a smaller
            set to make sure it is doing what you want. I see a "start
            of the line" character in the middle of your expression
            ("^"). I see why it could be working, with the prior
            expression being zero or more (*), but knowing what each
            character does is generally a good idea. The help on the
            tool is good as are many web sites, but this is simple.
            Also, you don't need the // slashes, just enter the
            expression. 

            

            To get you started: I would use something like this, with
            the Select tool and "Matching":

            

            ^..*\t[ATCGatcg]+$

            

            (Only one dot is really required, this is just how I always
            do it. Adds a bit of a format sanity check into the filter).

            

            Hope this helps!

            

            Jen

            Galaxy team

            

            

            On 12/8/13 6:21 PM, 朱师云
              wrote:

            
            
              
              Hi Jen,
                As the title, I have a [fasta] file that obtained
                  from a [gtf] file,
                

                
                >cuff102.1
                atcgtaaagggcgat
                >cuff103.1
                gtcgttgactNNNNNNNNgtc
                

                
                and I want to get the output like this to filter
                  the sequences that contain any not[ATCG] character?
                

                
                
                  >cuff102.1
                  atcgtaaagggcgat
                
                

                
                I have a large of sequences to filter. I thought a
                  way that firstly convert the file to [interval] file,
                  and secondly SELECT the line not matching the patten
                  /\t[ATCGatcg]*[^ATCGatcg]/.
                Am I right? Or there
                    is a one-step way ?
                

                
                

                  

                  
                
              
              

              
              

              ___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
            
            

            -- 
Jennifer Hillman-Jackson
http://galaxyproject.org
          
        
      
    
    

    -- 
Jennifer Hillman-Jackson
http://galaxyproject.org                                          
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Reply via email to