Holger Stratmann created CSV-131:
------------------------------------

             Summary: save positions of records to enable random access
                 Key: CSV-131
                 URL: https://issues.apache.org/jira/browse/CSV-131
             Project: Commons CSV
          Issue Type: Improvement
          Components: Parser
    Affects Versions: 1.1
            Reporter: Holger Stratmann
            Priority: Minor


It would be good to have the position of the {{CSVRecord}} save its position in 
the source stream.

Reason: Knowing the position of the records would enable random access to 
retrieve records from the source (after reading it once to build an index) if 
the file is too large to be read into memory (or if we don't want to read the 
full file to access a record in the middle).

Additional info: I have created a "random access csv reader" and a "csv viewer" 
(Swing) for arbitrarily large CSV files. It requires one additional scan of the 
file to build an index (multi-byte charsets supported). The index can be saved 
to a file so it only needs to be built once. Because the lexer uses a 
BufferedReader, we need "internal information" to know where each record starts.
The change to "core" is minor: one field in {{CSVRecord}}s and some associated 
methods to store the position.
Patch will be attached.
Code for random access (both UI and non-UI) will be proposed (and possibly 
submitted) as a separate issue. It could also be an independent add-on but 
requires this one little change to Commons CSV.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to