Thank you very much for the feedback!
Since there is no efficient "textscan"-like function at this time, it seems
to me that your DataFrames suggestion is perhaps the most sensible thing
for me to do, since it appears to be a package built to do things somewhat
close to my original goal (along with a lot of other statistical bells and
whistles). Therefore, I would guess it would be more efficient than the
{Any,2} array type from readdlm.
If I do end up doing more complicated things using split() and parse(), I
will post my solution back here with my code in case others come looking
for a similar solution.
On Friday, June 26, 2015 at 9:06:08 AM UTC-5, Seth wrote:
>
>
>
> On Tuesday, June 23, 2015 at 10:45:11 PM UTC-5, Garrett Jenkinson wrote:
>>
>> Sorry if this is overly basic question, but I searched around the
>> documentation and the user group questions and have not been able to find
>> an answer. I wondering if there was a way in Julia to read from a formatted
>> text file, in the same way as matlab's textscan function:
>>
>> http://www.mathworks.com/help/matlab/ref/textscan.html
>>
>> readdlm does not seem to do what I am looking for (or maybe I'm using it
>> wrong!). Suppose I have data coming from a bed file, which is formatted
>> like this:
>>
>> chr1 500 34543 1.433
>> chr1 46546 3543 4.68
>> chr2 4543 34456 6.3545
>>
>> It would be nice to specify the format "chr%u %u %u %f" and to get four
>> vectors (three with Ints and one with floats):
>>
>> [1,1,2]
>> [500,46546,4543]
>> [34543,3543,34456]
>> [1.433,4.68,6.3545]
>>
>> Is there a function to do this? If not, is there a simple way to do this
>> with the functions that are available?
>>
>> Thanks!
>> Garrett
>>
>> P.S. I know that @printf basically allows the opposite of this to be done
>> (i.e., to write out to a file by specifying a format). Basically, my
>> question is if there exists the equivalent @readf to read in something that
>> was produced by @printf?
>>
>
>
> A quick and dirty way of doing this would be the following:
>
> t = readdlm(filename)' # note the transpose at the end
>
>
> There are a couple of issues here:
>
> 1: you'll need to manipulate the "chr" strings (to get rid of "chr") - see
> also #3, below
> 2: you've got an array instead of a set of vectors (easy to convert using
> comprehension or other methods)
> 3: the array is of type "{Any,2}" which may lead to inefficiencies. If you
> could get rid of "chr" before reading so that the file only contains things
> that parse into numbers, the array would be of type "{Float64, 2"} which is
> more precise, though still not 100% equivalent.
>
> If you want to get more complicated, you could use split() and parse() to
> do what you need, iterating over each line.
>
> You could also use DataFrames to do something like this:
>
> t = readtable(filename, separator = ' ', header=false)
>
> but you'd still require some manipulation.
>
> There also may be better / more clever ways, but these are the first that
> come to mind.
>