[
https://issues.apache.org/jira/browse/CSV-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574019#comment-16574019
]
Serge P. Nekoval commented on CSV-196:
--------------------------------------
FYI I've submitted a patch CSV-229 with similar feature. Not sure how it
compares.
> Store the information of raw data read by lexer
> -----------------------------------------------
>
> Key: CSV-196
> URL: https://issues.apache.org/jira/browse/CSV-196
> Project: Commons CSV
> Issue Type: Improvement
> Components: Parser
> Affects Versions: 1.4
> Reporter: Matt Sun
> Priority: Major
> Labels: patch
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> It will be good to have CSVParser class to store the info of whether a field
> was enclosed by quotes in the original source file.
> For example, for this data sample:
> A, B, C
> a1, "b1", c1
> CSVParser gives us record a1, b1, c1, which is helpful because it parsed
> double quotes, but we also lost the information of original data at the same
> time. We can't tell from the CSVRecord returned whether the original data is
> enclosed by double quotes or not.
> In our use case, we are integrating Apache Hadoop APIs with Commons CSV. CSV
> is one kind of input of Hadoop Jobs, which should support splitting input
> data. To accurately split a CSV file into pieces, we need to count the bytes
> of data CSVParser actually read. CSVParser doesn't have accurate information
> of whether a field was enclosed by quotes, neither does it store raw data of
> the original source. Downstream users of commons CSVParser is not able to get
> those info.
> To suggest a fix: Extend the token/CSVRecord to have a boolean field
> indicating whether the column was enclosed by quotes. While Lexer is doing
> getNextToken, set the flag if a field is encapsulated and successfully parsed.
> I find another issue reported with similar request, but it was marked as
> resolved: [CSV91]
> https://issues.apache.org/jira/browse/CSV-91?jql=project%20%3D%20CSV%20AND%20text%20~%20%22with%20quotes%22
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)