You've all made good points, and I changed the code slightly to provide the
initial array side in order to avoid the recreation of the array on each
iteration. This brought down the loading time to a much more bearable *14
seconds*. I rewrote the Lisp code to be compatible with the APL code and
the time was *1.46 seconds*. This suggests that GNU APL is consistently
about 10 times slower than non-optimised Lisp code. To me, this is not
unexpected given the fact that GNU APL isn't designed to be
high-performance.
However, while 14 seconds for 30k is manageable, I have had the need to
work with arrays of over a million rows. Extrapolating this suggests that
it would take almost 8 minutes to load such a file. Thus, unless GNU APL
can magically improve overall performance by at least 10 times, I still
think we need a native CSV loading function.
Regards,
Elias
For reference, here is the APL code:
∇Z ← type convert_entry value
→('n'≡type)/numeric
→('s'≡type)/string
⎕ES 'Illegal conversion type'
numeric:
Z←⍎value
→end
string:
Z←value
end:
∇
∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i
separator ← ' '
Z ← n (↑⍴pattern) ⍴ 0
fd ← 'r' FIO∆fopen filename
i ← ⎕IO
next:
line ← FIO∆fgets fd ⍝ Read one line from the file
→(⍬≡line)/end
→(10≠line[⍴line])/skip_nl ⍝ If the line ends in a newline
line ← line[⍳¯1+⍴line] ⍝ Remove the newline
skip_nl:
line ← ⎕UCS line
Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line
i ← i+1
→next
end:
FIO∆fclose fd
∇
And here is the Lisp code (the test case was running on SBCL), requires the
QL packages SPLIT-SEQUENCE and PARSE-NUMBER:
(defparameter *result*
(time
(with-open-file (s "apjs492452t1_mrt.txt")
(let ((res (make-array '(34030 11))))
(dotimes (i (array-dimension res 0))
(let* ((line (read-line s))
(parts (split-sequence:split-sequence #\Space line
:remove-empty-subseqs t)))
(loop
for ii from 0 below 10
for p in parts
do (setf (aref res i ii) (parse-number:parse-number
p)))
(setf (aref res i 10) (nth 10 parts))))
res))))
On 18 January 2017 at 09:57, Blake McBride <[email protected]> wrote:
> On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <[email protected]>
> wrote:
>
>> I always feel GNU APL kind of slow compared to Dyalog, but I never really
>> compared two in large dataset.
>> I'm mostly using J now for large dataset.
>> If Elias has the optimized code for GNU APL and a reproducible way to
>> measure timing, I'd like to compare it with Dyalog and J.
>
>
> I think that's actually a good idea. It would be a good comparison. It
> would really make it clear if there is a blaring problem. But first the
> APL code should be optimized a bit (but nothing crazy like reading it all
> into memory right now.)
>
> --blake
>
>
>
>