Hey sebb, Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <seb...@gmail.com>:
> On 23 August 2018 at 00:01, Bruno P. Kinoshita > <brunodepau...@yahoo.com.br.invalid> wrote: > > > >>Maybe I'm just not getting it, but it feels pretty messed up :-) > > > > > > Mutual feeling, and +1 for consistency. From what I understood, users > should be able to parse these crazy CVS's, but if they tried to re-create > them, with comments, then they wouldn't be able to avoid the > println/newline (so it wouldn't be parseable later with the same reader). > > > > > > We probably need a ticket for it to aggregate the discussion and maybe a > possible solution. > > I'm wondering whether we need to be as flexible when *creating* the CSV > files. > > "Be liberal in what you accept, and conservative in what you send" (Jon > Postel) > > In this case send == create, as it might be sent to other less liberal > readers. > > I don't have a problem with the output being less flexible, so long as > it is sufficiently flexible (which I think it likely is already). > > I don't think consistency is necessary - or even desirable - here. > okay, but wouldn't you expect that you can use a CSVFormat instance to read a file that you created with it? This is currently not the case. Regards, Benedikt > > > Cheers > > > > ________________________________ > > From: Benedikt Ritter <brit...@apache.org> > > To: Commons Developers List <dev@commons.apache.org>; > brunodepau...@yahoo.com.br > > Sent: Thursday, 23 August 2018 7:10 AM > > Subject: Re: [CSV] Inconsistent record separator behavior > > > > > > > > Hi Bruno, > > > > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita > > <brunodepau...@yahoo.com.br.invalid>: > > > >> Hi, > >> > >> > >> Will try to look at the code and give a better answer during the > weekend. > >> But risking a silly question, would it mean that users are not able to > >> parse a CSV unless each CSV row is separated by LF or CRLF? > > > > > > Yes. > > > > > >> I remember getting a CSV in a government website some time ago that was > >> formatted in a very strange way, and if I remember well it was a small > >> file, but without LF or CRLF. I think it was using | to separate the > rows, > >> and , for columns. > >> > > > > I didn't know that there are formats that don't use a new line as line > > separator. > > > > > >> > >> > >> Quick search returned at least another person with similar issue > >> > https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator > >> > >> > >> Not sure if I understood the problem well, but in case it makes sense... > >> my suggestion would be to perhaps confirm if we could change > >> CSVPrinter.printComment to accept other characters for line ending? > >> > > > > The inconsistency I'm seeing is, that we an the one hand accept any > > character sequence as a record separator. Comments in a way a like > special > > records to me. But our implementation seems to put them on a new "line" > > using the println() method. The println() method in turn uses the record > > seperator to start a new record. So it's not necessarily a new line. > > Nevertheless while processing a comment, we look out for CR and LF and > then > > we call println() again. Maybe I'm just not getting it, but it feels > pretty > > messed up :-) > > > > Regards, > > Benedikt > > > > > > > >> > >> > >> Thanks! > >> > >> Bruno > >> > >> > >> ________________________________ > >> From: Benedikt Ritter <brit...@apache.org> > >> To: Commons Developers List <dev@commons.apache.org> > >> Sent: Tuesday, 21 August 2018 7:13 PM > >> Subject: [CSV] Inconsistent record separator behavior > >> > >> > >> > >> Hi, > >> > >> > >> we have this strange handling of record separator / line endings in CSV: > >> > >> > >> Users can use what ever character sequence they like as a record > separator. > >> > >> I could for example use the ! character to mark the end of a record. > >> > >> Then we have CSVPrinter.printComment(String). This inserts comments > into a > >> > >> CSV output. It detects CRLF and call println() on the CSVFormat, which > in > >> > >> turn uses the record separator to indicate a new record... > >> > >> > >> So now I'm thinking: Does it make sense to use anything else but LF or > CRLF > >> > >> as record separator? Maybe we should deprecate > >> > >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where > >> > >> users can choose between LF and CRLF. This way we can make the behavior > >> > >> between parsing and printing consistent. > >> > >> > >> Thoughts? > >> > >> Benedikt > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > > > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >