On Tuesday, June 23, 2015 at 10:45:11 PM UTC-5, Garrett Jenkinson wrote:
>
> Sorry if this is overly basic question, but I searched around the 
> documentation and the user group questions and have not been able to find 
> an answer. I wondering if there was a way in Julia to read from a formatted 
> text file, in the same way as matlab's textscan function:
>
> http://www.mathworks.com/help/matlab/ref/textscan.html
>
> readdlm does not seem to do what I am looking for (or maybe I'm using it 
> wrong!). Suppose I have data coming from a bed file, which is formatted 
> like this:
>
> chr1  500  34543   1.433
> chr1  46546  3543   4.68
> chr2  4543    34456  6.3545
>
> It would be nice to specify the format "chr%u %u %u %f" and to get four 
> vectors (three with Ints and one with floats):
>
> [1,1,2]
> [500,46546,4543]
> [34543,3543,34456]
> [1.433,4.68,6.3545]
>
> Is there a function to do this? If not, is there a simple way to do this 
> with the functions that are available? 
>
> Thanks!
> Garrett
>
> P.S. I know that @printf basically allows the opposite of this to be done 
> (i.e., to write out to a file by specifying a format). Basically, my 
> question is if there exists the equivalent @readf to read in something that 
> was produced by @printf?
>


A quick and dirty way of doing this would be the following:

t = readdlm(filename)'  # note the transpose at the end


There are a couple of issues here:

1: you'll need to manipulate the "chr" strings (to get rid of "chr") - see 
also #3, below
2: you've got an array instead of a set of vectors (easy to convert using 
comprehension or other methods)
3: the array is of type "{Any,2}" which may lead to inefficiencies. If you 
could get rid of "chr" before reading so that the file only contains things 
that parse into numbers, the array would be of type "{Float64, 2"} which is 
more precise, though still not 100% equivalent.

If you want to get more complicated, you could use split() and parse() to 
do what you need, iterating over each line. 

You could also use DataFrames to do something like this:

t = readtable(filename, separator = ' ', header=false)

but you'd still require some manipulation.

There also may be better / more clever ways, but these are the first that 
come to mind.

Reply via email to