[
https://issues.apache.org/jira/browse/CSV-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129051#comment-14129051
]
Holger Stratmann commented on CSV-131:
--------------------------------------
Agreed. I had realized I forgot to supply proper tests for "core" (I do have
tests in my code using these changes, but those are in different (new)
packages). I had planned to add tests this weekend, but while I have your
attention, let's go. I added a new patch that contains the updated testcase.
As you see I had added the method "setNextRecordNumber" so records get the
correct numbers even if we start reading at some random record.
I'm wondering if I / we should also add "setCharacterPosition" (I don't like
that name, it sounds like it does seeking - better ideas?) to tell the parser
at which position in the "original source" we are when we start reading. Right
now, when I skip to some record and start reading, the returned record will get
characterPosition == 0. I guess it would be nicer if the records were
completely identical no matter where we started reading them...
> save positions of records to enable random access
> -------------------------------------------------
>
> Key: CSV-131
> URL: https://issues.apache.org/jira/browse/CSV-131
> Project: Commons CSV
> Issue Type: Improvement
> Components: Parser
> Affects Versions: 1.1
> Reporter: Holger Stratmann
> Priority: Minor
> Attachments: PositionTracking_20140907.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> It would be good to have {{CSVRecord}} save its position in the source stream.
> Reason: Knowing the position of the records would enable random access to
> retrieve records from the source (after reading it once to build an index) if
> the file is too large to be read into memory (or if we don't want to read the
> full file to access a record in the middle).
> Additional info: I have created a "random access csv reader" and a "csv
> viewer" (Swing) for arbitrarily large CSV files. It requires one additional
> scan of the file to build an index (multi-byte charsets supported). The index
> can be saved to a file so it only needs to be built once. Because the lexer
> uses a BufferedReader, we need "internal information" to know where each
> record starts.
> The change to "core" is minor: one field in {{CSVRecord}}s and some
> associated methods to store the position.
> Patch will be attached.
> Code for random access (both UI and non-UI) will be proposed (and possibly
> submitted) as a separate issue. It could also be an independent add-on but
> requires this one little change to Commons CSV.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)