Thanks. the problem Iam having is not from disk to memory I read the disk file into a text array, limiting each array element to 1.5g (not that I have really had something that big, it is a hold over from pre v11 where 32k characters was a text var/field limit).
so my file reading scheme is: open document repeat receive packet (doc_ref;array element;1,500,000) if not EOF) add element to array end if until EOF Then process the text Chip On Thu, 17 Nov 2016 14:14:19 +0100, Arnaud de Montard wrote: > >> Le 16 nov. 2016 à 20:12, Chip Scheide <[email protected]> a écrit : >> >> I have a routine which parses text. >> It seemed to function well, until recently, when I had to feed it 50 >> megs of text (48.3 million characters). >> The data is Cr delimited, and each line of text is of variable length. > > Hi Chip, > you can't use 'document to text' (since v13 only) and I doubt about > using 'document to blob' to "load at once" such a big document. For > my own, I use load at once when the document is small enough in 4D > 32bits versions (small means <500Mb). > > Schematically: > > **** > $trailing_t:="" > ARRAY TEXT($line_at;0) > $sizePacket_l:=100000 //to be tuned > USE CHARACTER SET("UTF-8";0) //example > $ref_h:=Open document("";"") > if(ok=1) > repeat > RECEIVE PACKET($ref_h;$packet_t;$sizePacket_l) > $trailing_t:=$trailing_t+$packet_t > Explode(->$line_at;"\r") //CR delimited text to array > $numberOfLines_l:=Size of array($line_at) > $trailing_t:=$line_at{$numberOfLines_l} //keep last line aside > For($i_l;1;$numberOfLines_l-1) > //do something with $line_at{$i_l} > End for > until(ok=0) > //don't forget last piece here ;-) > CLOSE DOCUMENT($ref_h) > end if > USE CHARACTER SET(*;0) > **** > > I've used this to import a 6.6 Gbytes text document 2 years ago, > really fast (of course SSD disk is better). What happens in the "For" > is another story. > > Note 1 > avoid using a stop char in the reading process, it is what makes it slow. > > Note 2 > if the document only contains "low ascii chars" (one byte=one char), > you can: > - remove 'USE CHARACTER SET' > - read blob instead of text in 'RECEIVE PACKET' > - convert each packet with blob to text > Did not test, but I think it's faster. > > -- > Arnaud de Montard > > > ********************************************************************** > 4D Internet Users Group (4D iNUG) > FAQ: http://lists.4d.com/faqnug.html > Archive: http://lists.4d.com/archives.html > Options: http://lists.4d.com/mailman/options/4d_tech > Unsub: mailto:[email protected] > ********************************************************************** ********************************************************************** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:[email protected] **********************************************************************

