[jira] [Work logged] (CSV-253) Handle absent values in input (null)

ASF GitHub Bot (Jira) Mon, 14 Dec 2020 00:57:05 -0800


     [ 
https://issues.apache.org/jira/browse/CSV-253?focusedWorklogId=523754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-523754
 ]


ASF GitHub Bot logged work on CSV-253:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Dec/20 08:56
            Start Date: 14/Dec/20 08:56
    Worklog Time Spent: 10m 
      Work Description: lbruun commented on pull request #51:
URL: https://github.com/apache/commons-csv/pull/51#issuecomment-744287248


   @garydgregory I think what you are essentially saying is to do [PR 
77](https://github.com/apache/commons-csv/pull/77) rather than this one. Either 
way will work.
   
   It is all about the appetite for breaking changes (PR77) vs a new opt-in 
feature (this PR). On one hand you can argue that the library should have been 
able to distinguish between empty string and absent string from the very 
beginning and therefore fixing it by changing existing behavior is justified, 
meaning do PR77. But in the world of CSV only a single generally agreed-upon 
standard exists, RFC4180, and that one is inconclusive about this scenario. We 
also have the [W3's Recommendation for Tabular 
Data](https://www.w3.org/TR/tabular-data-model/) but that one actually clearly 
states that the empty string and absent string are the same. This speaks 
towards that the way the library currently works should be the default and 
therefore that a breaking change it not acceptable.
   
   Example: Let's say a user is currently using static format `POSTGRESQL_CSV` 
to parse output which originates from a PostgreSQL database. Such user will 
with PR77 potentially start to see NPEs just because he updated to latest 
version of Apache Commons CSV. Ouch!
   
   However, what speaks in favor of a breaking change, meaning PR77, is that 
most users will possibly (unfairly?) expect a CSV format to be _lossless_, 
meaning that if they export from their database into CSV and then re-import 
that CSV file into same database then no data has been lost or distorted. With 
PR77 you uphold the principle of lossless conversion and if this is what users 
implicitly expect from the library then PR77 is the way to go.
   
   Personally I think I'm leaning towards PR77, even if it is a breaking 
change. It just needs clear documentation, 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 523754)
    Time Spent: 1h 20m  (was: 1h 10m)

> Handle absent values in input (null)
> ------------------------------------
>
>                 Key: CSV-253
>                 URL: https://issues.apache.org/jira/browse/CSV-253
>             Project: Commons CSV
>          Issue Type: Improvement
>          Components: Parser
>            Reporter: Lars Bruun-Hansen
>            Priority: Major
>         Attachments: 2019-10-30 20_31_39-Apache Commons CSV 1.8-SNAPSHOT 
> API.png, Parser-setting-absentIsNull-Javadoc.png
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The parser must be able to handle absent values in input and translate that 
> into {{null}} as required. I see several tickets on this matter in the 
> history, but none seem to have addressed the issue, at least not for parsing. 
> For this problem, I see a need to introduce a new term:
> Definition: _Absent value_ is when there are zero characters between field 
> delimiters.
> Specifically the aim is to be able to parse the following:
> {noformat}
>     "John",,"Doe"    // 2nd element is absent
>     ,"AA",123        // 1st element is absent
>     "John",90,       // 3rd element is absent
>     "",,90           // 2nd element is absent (1st element isn't)
> {noformat}
>  
> See also CSV-93 which I think never addressed the issue, probably because the 
> reporter was happy with having the issue fixed for CSV output, not for 
> parsing.
> A PR is coming...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (CSV-253) Handle absent values in input (null)

Reply via email to