Thanks, I changed my function to use the new FIO 49, and the resuolt is
much more compact:
∇Z ← pattern read_csv_n filename
Z← {pattern convert_entry¨ (⍵≠' ') ⊂ ⍵}¨ ⎕FIO[49] filename
∇
It's a bit faster too, as this version runs in 11 seconds.
However, the result is not entirely correct, as this version creates a
1-dimensional array where each element is an array consisting of the values
for one row.
Is there some way I can use EACH to map over the elemts and generate a
two-dimensional array?
Regards,
Elias
On 18 January 2017 at 21:46, Juergen Sauermann <
[email protected]> wrote:
> Hi,
>
> as a start I have added *⎕FIO**[49] *in *SVN 851*. It reads an entire
> UTF8 encoded file and puts every line of the
> file into one nested Item of the result. Trailing CR and LF are being
> removed in the precess.
>
> Next step is to turn *⎕FIO[49]* into an operator so that you can give it
> an APL function that converts every line into the
> desired result. Until then you can use it like:
>
> *Z←CONVERT¨Z←⎕FIO**[49] 'filename'*
>
> /// Jürgen
>
>
> On 01/18/2017 11:17 AM, Elias Mårtenson wrote:
>
> You've all made good points, and I changed the code slightly to provide
> the initial array side in order to avoid the recreation of the array on
> each iteration. This brought down the loading time to a much more bearable *14
> seconds*. I rewrote the Lisp code to be compatible with the APL code and
> the time was *1.46 seconds*. This suggests that GNU APL is consistently
> about 10 times slower than non-optimised Lisp code. To me, this is not
> unexpected given the fact that GNU APL isn't designed to be
> high-performance.
>
> However, while 14 seconds for 30k is manageable, I have had the need to
> work with arrays of over a million rows. Extrapolating this suggests that
> it would take almost 8 minutes to load such a file. Thus, unless GNU APL
> can magically improve overall performance by at least 10 times, I still
> think we need a native CSV loading function.
>
> Regards,
> Elias
>
> For reference, here is the APL code:
>
> ∇Z ← type convert_entry value
> →('n'≡type)/numeric
> →('s'≡type)/string
> ⎕ES 'Illegal conversion type'
> numeric:
> Z←⍎value
> →end
> string:
> Z←value
> end:
> ∇
>
> ∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i
> separator ← ' '
> Z ← n (↑⍴pattern) ⍴ 0
> fd ← 'r' FIO∆fopen filename
> i ← ⎕IO
>
> next:
> line ← FIO∆fgets fd ⍝ Read one line from the file
> →(⍬≡line)/end
> →(10≠line[⍴line])/skip_nl ⍝ If the line ends in a newline
> line ← line[⍳¯1+⍴line] ⍝ Remove the newline
> skip_nl:
> line ← ⎕UCS line
> Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line
> i ← i+1
> →next
> end:
>
> FIO∆fclose fd
> ∇
>
> And here is the Lisp code (the test case was running on SBCL), requires
> the QL packages SPLIT-SEQUENCE and PARSE-NUMBER:
>
> (defparameter *result*
> (time
> (with-open-file (s "apjs492452t1_mrt.txt")
> (let ((res (make-array '(34030 11))))
> (dotimes (i (array-dimension res 0))
> (let* ((line (read-line s))
> (parts (split-sequence:split-sequence #\Space
> line :remove-empty-subseqs t)))
> (loop
> for ii from 0 below 10
> for p in parts
> do (setf (aref res i ii) (parse-number:parse-number
> p)))
> (setf (aref res i 10) (nth 10 parts))))
> res))))
>
> On 18 January 2017 at 09:57, Blake McBride <[email protected]> wrote:
>
>> On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <[email protected]>
>> wrote:
>>
>>> I always feel GNU APL kind of slow compared to Dyalog, but I never
>>> really compared two in large dataset.
>>> I'm mostly using J now for large dataset.
>>> If Elias has the optimized code for GNU APL and a reproducible way to
>>> measure timing, I'd like to compare it with Dyalog and J.
>>
>>
>> I think that's actually a good idea. It would be a good comparison. It
>> would really make it clear if there is a blaring problem. But first the
>> APL code should be optimized a bit (but nothing crazy like reading it all
>> into memory right now.)
>>
>> --blake
>>
>>
>>
>>
>
>
>