[
https://issues.apache.org/jira/browse/CSV-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188135#comment-14188135
]
Holger Stratmann commented on CSV-131:
--------------------------------------
sorry for taking so long to comment - I was waiting for other people's
comments, was busy and on vacation... I probably don't have time today, but
will comment soon... first thoughts: I came to the conclusion that we *don't*
need the setters and it would be nice and clean solution passing these via an
alternate constructor. Wanted to prepare an alternative patch for that...
For decorator: Hmm, from the top of my head, I don't think that would work
well. The decorator wouldn't really *add* any functionality. The parser itself
does not seek (that is a different class that I did indeed implement as a
decorator (of sorts)). Also, it needs access to functionality that is not
currently exposed. On top of that, the CSVRecords have only final fields so the
"decorator" would have to overwrite huge amounts of code to create different
CSVRecords... or you'd have to make the record number and position writeable
(which doesn't seem to be really useful)
I'll come back with a more thorough analysis as soon as I can...
> Save positions of records to enable random access
> -------------------------------------------------
>
> Key: CSV-131
> URL: https://issues.apache.org/jira/browse/CSV-131
> Project: Commons CSV
> Issue Type: Improvement
> Components: Parser
> Affects Versions: 1.1
> Reporter: Holger Stratmann
> Priority: Minor
> Attachments: CSV-131-gg-0.diff,
> PositionTrackingFull_v101_20140910.patch,
> PositionTrackingTest_20140907.patch, PositionTracking_20140907.patch,
> ggregory-CSV-131-parser-and-record.diff
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> It would be good to have {{CSVRecord}} save its position in the source stream.
> Reason: Knowing the position of the records would enable random access to
> retrieve records from the source (after reading it once to build an index) if
> the file is too large to be read into memory (or if we don't want to read the
> full file to access a record in the middle).
> Additional info: I have created a "random access csv reader" and a "csv
> viewer" (Swing) for arbitrarily large CSV files. It requires one additional
> scan of the file to build an index (multi-byte charsets supported). The index
> can be saved to a file so it only needs to be built once. Because the lexer
> uses a BufferedReader, we need "internal information" to know where each
> record starts.
> The change to "core" is minor: one field in {{CSVRecord}}s and some
> associated methods to store the position.
> Patch will be attached.
> Code for random access (both UI and non-UI) will be proposed (and possibly
> submitted) as a separate issue. It could also be an independent add-on but
> requires this one little change to Commons CSV.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)