Thanks for getting back so quick.

Please don't assume everything is fully qualified regarding CSV files - 
that will cause many problems with the CSV importer in cypher for many 
customers.

Legacy systems that we are getting data from do NOT export with qualifiers 
around text - of often we get just a delimited field break character.  Many 
times exports from legacy systems (think green screens) have "fixed" width 
fields and NO qualifiers for data type.  So you may get a 20 character 
field with only 3 characters of text in it routinely.  A good 'sed' script 
can fix that to squeeze the air out of the file on unix, BUT adding in text 
qualifiers BLOATS the file on larger data sets to get it to go through the 
neo4j csv importer (not to mention it can be complicated to do well).

As a hack, we have used sqlite3-shell to create an in-memory sql database 
to import CSV files, and then clean them up and then re-export them to a 
more formatted CSV file - BUT, even that export has it's own assumptions as 
what a "value" is. -- sqlite will often write single word/numbers values 
WITHOUT double quotes and multi-word/number values as strings.

It would  be MUCH BETTER to have the cypher CSV importer consider the 
FIELDTERMINIATOR as the separator between field values.  and then 
optionally qualify if the value has a text qualifier or not (e.g. 
double-quotes around the value by default).  Hence my request/suggestion 
for a *TEXTQUALIFIER true/false option.*

This is a BIG issue. (huge, if you are trying to migrate legacy system data 
into neo4j and do daily updates while legacy systems are kept alive and 
feeding updates to neo4j).

Dave

Always open to suggestion on this, but I have pounded my head on this issue 
for the better part of a weekend trying to figure out what exactly is 
causing my problems....




On Sunday, November 2, 2014 5:58:35 PM UTC-5, Michael Hunger wrote:
>
> Does using just a doubled double quote "" or an escape quote \" help?
>
> The common CSV idioms indicate that as soon as you have special stuff in 
> your text values like quotes or newlines and such, you _must_ quote them.
>
> Which tool created this unusual CSV format?
>
> Michael
>
> On Sun, Nov 2, 2014 at 10:52 PM, David Bigelow <
> [email protected] <javascript:>> wrote:
>
>> This was a painful discovery.
>>
>> It appears that neo4j is EXPECTING Quoted content in a CSV file no matter 
>> what.
>>
>> If you send data that does NOT have Text Qualifiers as an option from 
>> another database source to generate your CSV File but DOES have Double 
>> Quotes within the content, neo4j will consider the double-quote as the 
>> start of a separated content REGARDLESS of your FIELDTERMINIATOR setting.
>>
>> For example (4 lines of text to be added to a single node property)
>>
>> ABC1203|1|Length: 2" from left
>> ABC1203|2|Textured
>> ABC1203|3|Depth: 6" from angle
>> ABC1203|4|Thickness: 8" from edge
>> ABC1203|5|Paint: 0.02" all around
>>
>> Might look like the following after import:
>>  from left ABC1203|2|Textured ABC1203|3|Depth: 6  from edge 
>> ABC1203|5|Paint: 0.02" all around
>>
>>
>> The problem appears to be that neo4j is expecting values to be contained 
>> in Quotes and Properly Escaped:
>> ABC1203|1|"Length: 2\" from left"
>> ABC1203|2|"Textured"
>> ABC1203|3|"Depth: 6\" from angle"
>> ABC1203|4|"Depth: 8\" from edge"
>> ABC1203|5|"Paint: 0.02\" all around"
>>
>> This will get you more like this:
>> Length: 2" from left Height: 2" from base Depth: 6" from angle Thickness: 
>> 8" from edge Paint: 0.02" all around
>>
>>
>> This is a bit of problem relative to importing CSV files...  Not all 
>> systems will write data out the same way for CSV.
>>
>> For example, sqlite3-shell will write out the above data like this - 
>> notice how a single value is NOT Text Qualified, and it double escapes 
>> Double Quotes.
>>
>> ABC1203|1|"Length: 2"" from left"
>> ABC1203|2|Textured
>> ABC1203|3|"Depth: 6"" from angle"
>> ABC1203|4|"Thickness: 8"" from edge"
>> ABC1203|5|"Paint: 0.02"" all around"
>>
>> *Proposed Solution:*
>> Include a *TEXTQUALIFIER* *true/false *option so that the default rule 
>> for assuming double quotes around values can be disabled more easily and 
>> also assume that content between the FIELDTERMINIATOR are qualified by the 
>> FIELDTERMINIATOR that is specified by the user in the header of the CSV 
>> import cypher.
>>
>> (note: this is a hacky example - I can come up with something more 
>> definitive if necessary - but I have been fighting this quite a bit).
>>
>> Dave
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to