[Libreoffice-bugs] [Bug 150141] loading 8.5M row .csv never completes - 100% CPU entire time

bugzilla-daemon Tue, 26 Jul 2022 23:51:32 -0700

https://bugs.documentfoundation.org/show_bug.cgi?id=150141


--- Comment #3 from Pierre Fortin <[email protected]> ---
(In reply to Roman Kuznetsov from comment #2)
> Calc supports only ~1 million rows by default

Yes, but this report is against the new jumbo feature of 16M rows which is
almost enough for what my team needs.

> Anyway please attach your CSV here

Plenty of examples available at https://dl.ncsbe.gov/?prefix=data/ -- look for
the big files....
These zip files mostly contain a single .txt (mostly tab separated "csv")...
See also the Snapshots folder...   Files may contain tab or comma separated
data; but vary in data encoding.  If the 16 bit encoded files give you trouble,
you can use the Linux command:
  tr -d '\000"\r\377\376\275' < infile.txt > outfile.csv
to "clean" them up...

Cool! this daily build has a progress bar...  Loading a 5.7GB sheet... 
Progress bar reached the end 1 minute +|- 5 seconds after starting the load. 
Waiting for the sheet to display. Oh well... after another 1:40m, load failed:
too many rows... less than a minute after OK, sheet appeared...  HUGE speed
improvement over initial tests about a week ago.  Sheet showing 16,777,216
rows. This file is from
https://s3.amazonaws.com/dl.ncsbe.gov/data/ncvhis_Statewide.zip

$ ll ncvhis_Statewide-20220723-070658.csv
[snip] 4265533961 Jul 23 07:06 ncvhis_Statewide-20220723-070658.csv
$ wc -l ncvhis_Statewide-20220723-070658.csv
33686293 ncvhis_Statewide-20220723-070658.csv
^^^^^^^^ 
Even if Calc doubled the number of jumbo rows to 33,554,432; I'd still leave
131,861 rows on the cutting room floor...  :)  While it would be great to load
such sheets, we have to split them up.  I have one sheet covering 2012-2022
which we reduced to around 77M records...   but seriously, 16M rows is
something we'd be happy with for a while...  we have lots of ways to slice and
dice these large sheets; but 16M rows is a big help; I'm using the daily builds
almost exclusively when they work...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 150141] loading 8.5M row .csv never completes - 100% CPU entire time

Reply via email to