Wow - I will check out your other tool suggestions Thanks for that. We are using 'sed' often for cleanups of files, handy, but not a fun way to "fix" things from legacy systems.
One thing I *suspect* is that adding additional args to cypher e.g. trim(csvLine.MyCol) adds time to the importer? where as making sure the csv file is clean before import would go faster? Thoughts? Dave On Sunday, November 2, 2014 6:57:43 PM UTC-5, Michael Hunger wrote: > > There are some other tools like csvkit and csv lint like mentioned here, > that might help: http://jexp.de/blog/2014/10/load-cvs-with-success/ > <http://www.google.com/url?q=http%3A%2F%2Fjexp.de%2Fblog%2F2014%2F10%2Fload-cvs-with-success%2F&sa=D&sntz=1&usg=AFQjCNGWRDXVGCcR3ipPjW566SFPwI5mTQ> > > Did you try the single escaping \" > > For the time being, perhaps a simple sed script like "sed -e 's/"/''/g' > might help? replace a double quote with two single quotes? > > > > On Mon, Nov 3, 2014 at 12:32 AM, David Bigelow < > [email protected] <javascript:>> wrote: > >> Thanks for getting back so quick. >> >> Please don't assume everything is fully qualified regarding CSV files - >> that will cause many problems with the CSV importer in cypher for many >> customers. >> >> Legacy systems that we are getting data from do NOT export with >> qualifiers around text - of often we get just a delimited field break >> character. Many times exports from legacy systems (think green screens) >> have "fixed" width fields and NO qualifiers for data type. So you may get >> a 20 character field with only 3 characters of text in it routinely. A >> good 'sed' script can fix that to squeeze the air out of the file on unix, >> BUT adding in text qualifiers BLOATS the file on larger data sets to get it >> to go through the neo4j csv importer (not to mention it can be complicated >> to do well). >> >> As a hack, we have used sqlite3-shell to create an in-memory sql database >> to import CSV files, and then clean them up and then re-export them to a >> more formatted CSV file - BUT, even that export has it's own assumptions as >> what a "value" is. -- sqlite will often write single word/numbers values >> WITHOUT double quotes and multi-word/number values as strings. >> >> It would be MUCH BETTER to have the cypher CSV importer consider the >> FIELDTERMINIATOR as the separator between field values. and then >> optionally qualify if the value has a text qualifier or not (e.g. >> double-quotes around the value by default). Hence my request/suggestion >> for a *TEXTQUALIFIER true/false option.* >> >> This is a BIG issue. (huge, if you are trying to migrate legacy system >> data into neo4j and do daily updates while legacy systems are kept alive >> and feeding updates to neo4j). >> >> Dave >> >> Always open to suggestion on this, but I have pounded my head on this >> issue for the better part of a weekend trying to figure out what exactly is >> causing my problems.... >> >> >> >> >> On Sunday, November 2, 2014 5:58:35 PM UTC-5, Michael Hunger wrote: >>> >>> Does using just a doubled double quote "" or an escape quote \" help? >>> >>> The common CSV idioms indicate that as soon as you have special stuff in >>> your text values like quotes or newlines and such, you _must_ quote them. >>> >>> Which tool created this unusual CSV format? >>> >>> Michael >>> >>> On Sun, Nov 2, 2014 at 10:52 PM, David Bigelow <davidh...@ >>> simplifiedlogic.com> wrote: >>> >>>> This was a painful discovery. >>>> >>>> It appears that neo4j is EXPECTING Quoted content in a CSV file no >>>> matter what. >>>> >>>> If you send data that does NOT have Text Qualifiers as an option from >>>> another database source to generate your CSV File but DOES have Double >>>> Quotes within the content, neo4j will consider the double-quote as the >>>> start of a separated content REGARDLESS of your FIELDTERMINIATOR setting. >>>> >>>> For example (4 lines of text to be added to a single node property) >>>> >>>> ABC1203|1|Length: 2" from left >>>> ABC1203|2|Textured >>>> ABC1203|3|Depth: 6" from angle >>>> ABC1203|4|Thickness: 8" from edge >>>> ABC1203|5|Paint: 0.02" all around >>>> >>>> Might look like the following after import: >>>> from left ABC1203|2|Textured ABC1203|3|Depth: 6 from edge >>>> ABC1203|5|Paint: 0.02" all around >>>> >>>> >>>> The problem appears to be that neo4j is expecting values to be >>>> contained in Quotes and Properly Escaped: >>>> ABC1203|1|"Length: 2\" from left" >>>> ABC1203|2|"Textured" >>>> ABC1203|3|"Depth: 6\" from angle" >>>> ABC1203|4|"Depth: 8\" from edge" >>>> ABC1203|5|"Paint: 0.02\" all around" >>>> >>>> This will get you more like this: >>>> Length: 2" from left Height: 2" from base Depth: 6" from angle >>>> Thickness: 8" from edge Paint: 0.02" all around >>>> >>>> >>>> This is a bit of problem relative to importing CSV files... Not all >>>> systems will write data out the same way for CSV. >>>> >>>> For example, sqlite3-shell will write out the above data like this - >>>> notice how a single value is NOT Text Qualified, and it double escapes >>>> Double Quotes. >>>> >>>> ABC1203|1|"Length: 2"" from left" >>>> ABC1203|2|Textured >>>> ABC1203|3|"Depth: 6"" from angle" >>>> ABC1203|4|"Thickness: 8"" from edge" >>>> ABC1203|5|"Paint: 0.02"" all around" >>>> >>>> *Proposed Solution:* >>>> Include a *TEXTQUALIFIER* *true/false *option so that the default rule >>>> for assuming double quotes around values can be disabled more easily and >>>> also assume that content between the FIELDTERMINIATOR are qualified by the >>>> FIELDTERMINIATOR that is specified by the user in the header of the CSV >>>> import cypher. >>>> >>>> (note: this is a hacky example - I can come up with something more >>>> definitive if necessary - but I have been fighting this quite a bit). >>>> >>>> Dave >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Neo4j" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
