Re: [Neo4j] Incremental Data Load in Neo4j DB From Hive

Pranab Banerjee Wed, 27 Jan 2016 03:56:41 -0800

Thank you Michael

I am new to Neo4j and as per my knowledge "MERGE" will not partially use
existing patterns- its ALL or Nothing. that means either whole patterns
matches or whole patter created.


for example like below, I am not sure whether MERGE will be effective *??*


*One time load* (/user/home/*onetime.csv*)-
=============================================
*Id, field1, field2, field3, last_modified_dt*
=== ====== ====== ======= =================
*1 abcd efgh 12 2016-01-25 09:22:03  *<-*(New entry)*
*2 mnop efgh 14 2016-01-25 09:22:04  *<-*(New entry)*

After loading the above onetime data into respective node we received the
below incremental data

*Incremental Load* (/user/home/*incremental.csv*)-
===============================================
*Id, field1, field2, field3, last_modified_dt*
=== ====== ====== ======= =================
*2 txyz efgh 18 2016-01-27 09:48:03  <-(modified data)*
*3 hijk octu 17 **2016-01-27 09:49:00 * <-*(New entry)*

Appreciate any suggestion on this. Thanks in advance.

Pranab



On Wed, Jan 27, 2016 at 4:50 PM, Michael Hunger <
michael.hun...@neotechnology.com> wrote:

> If you have a timestamp or other flag in hive that shows the data as
> "new", you can use just a SELECT statement to get the information
>
> In general you'd use merge with parameters for that:
>
> MERGE (n:Label {id:{id}}) ON CREATE SET n.foo = {foo}, n.bar = {bar}
>
> or if you *always* want to update properties
>
> MERGE (n:Label {id:{id}})
> SET n.foo = {foo}, n.bar = {bar}
>
> For the actual run, there are several, options
>
> You can also export the select results to CSV (or make that CSV available
> via http) and use LOAD CSV
>
> LOAD CSV WITH HEADERS FROM "URL" as row
> MERGE (n:Label {id:row.id}) ON CREATE SET n.foo = row.foo, n.bar = row.bar
>
> or even
>
> LOAD CSV WITH HEADERS FROM "URL" as row
> MERGE (n:Label {id:row.id}) ON CREATE SET n += row
>
>
> Or pass  in all rows of the batch in as parameters, e.g.  {id:id, data:
> {col1:value, col2:value})
>
> UNWIND {rows} as row
> MERGE (n:Label {id:row.id}) ON CREATE SET n += row.data
>
> Michael
>
> Am 27.01.2016 um 06:55 schrieb Pranab Banerjee <prana...@gmail.com>:
>
> Hi
>
> This is regarding incremental Data Ingestion (Modify/New-Add) to Neo4j
> from Hive Data source.
>
> We need to incorporate the On-going "ChangeOnly/New" data load from source
> table (in Hive) to Neo4j DB.
>
>     1. If node already exist in Neo4j DB then only update/modify that
> specific node data.
>     2. If node doesn't  exist in Neo4j DB then append that specific node
> data as a new
>
> Can you please suggest any effective solution when the data volume is at
> scale (~5 million rows per day).
>
> Thanks
> Pranab
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Incremental Data Load in Neo4j DB From Hive

Reply via email to