[julia-users] Re: matlab-like textscan function?

Garrett Jenkinson Fri, 26 Jun 2015 18:57:07 -0700

Thank you very much for the feedback! 

Since there is no efficient "textscan"-like function at this time, it seems 
to me that your DataFrames suggestion is perhaps the most sensible thing 
for me to do, since it appears to be a package built to do things somewhat 
close to my original goal (along with a lot of other statistical bells and 
whistles). Therefore, I would guess it would be more efficient than the 
{Any,2} array type from readdlm.


If I do end up doing more complicated things using split() and parse(), I 
will post my solution back here with my code in case others come looking 
for a similar solution.


On Friday, June 26, 2015 at 9:06:08 AM UTC-5, Seth wrote:
>
>
>
> On Tuesday, June 23, 2015 at 10:45:11 PM UTC-5, Garrett Jenkinson wrote:
>>
>> Sorry if this is overly basic question, but I searched around the 
>> documentation and the user group questions and have not been able to find 
>> an answer. I wondering if there was a way in Julia to read from a formatted 
>> text file, in the same way as matlab's textscan function:
>>
>> http://www.mathworks.com/help/matlab/ref/textscan.html
>>
>> readdlm does not seem to do what I am looking for (or maybe I'm using it 
>> wrong!). Suppose I have data coming from a bed file, which is formatted 
>> like this:
>>
>> chr1  500  34543   1.433
>> chr1  46546  3543   4.68
>> chr2  4543    34456  6.3545
>>
>> It would be nice to specify the format "chr%u %u %u %f" and to get four 
>> vectors (three with Ints and one with floats):
>>
>> [1,1,2]
>> [500,46546,4543]
>> [34543,3543,34456]
>> [1.433,4.68,6.3545]
>>
>> Is there a function to do this? If not, is there a simple way to do this 
>> with the functions that are available? 
>>
>> Thanks!
>> Garrett
>>
>> P.S. I know that @printf basically allows the opposite of this to be done 
>> (i.e., to write out to a file by specifying a format). Basically, my 
>> question is if there exists the equivalent @readf to read in something that 
>> was produced by @printf?
>>
>
>
> A quick and dirty way of doing this would be the following:
>
> t = readdlm(filename)'  # note the transpose at the end
>
>
> There are a couple of issues here:
>
> 1: you'll need to manipulate the "chr" strings (to get rid of "chr") - see 
> also #3, below
> 2: you've got an array instead of a set of vectors (easy to convert using 
> comprehension or other methods)
> 3: the array is of type "{Any,2}" which may lead to inefficiencies. If you 
> could get rid of "chr" before reading so that the file only contains things 
> that parse into numbers, the array would be of type "{Float64, 2"} which is 
> more precise, though still not 100% equivalent.
>
> If you want to get more complicated, you could use split() and parse() to 
> do what you need, iterating over each line. 
>
> You could also use DataFrames to do something like this:
>
> t = readtable(filename, separator = ' ', header=false)
>
> but you'd still require some manipulation.
>
> There also may be better / more clever ways, but these are the first that 
> come to mind.
>

[julia-users] Re: matlab-like textscan function?

Reply via email to