>From what I understand, you want a single ES document with name:address relations as 1:N relation, where the only ID available is for the name (here in the example: 0000003934 for Kelly A. Draper).
It would help to define more identifiers for each address also, so you could index the addresses in one index, and person names in the other index, with two rivers. The support for nested objects in SQL pseudo column bracket notation is somewhat limited in JDBC river. If anyone feels like improving this, patches/pull requests would be very welcome! At the moment I feel without any identifiers or given enumeration scheme, it is impossible to identify a sequence of JSON objects in a nested document that can be collapsed/grouped. Jörg On Tue, Apr 22, 2014 at 4:35 PM, jrizzi1 <[email protected]> wrote: > I am having an issue with the jdbc river collapsing during the bulk insert > > i have records that have some single value properties, and can have > multiple > value properties (names, addresses and emails) > > > there are a total of around 4.5 million rows that collapse down to 600k > > if the river sql criteria is set to be where id="001", it works fine > > but during the bulk process ie all of my rows, only one property that can > have multiple values is correct, other properties are missing data > > > here is an example of what the query output that the river is using to > collapse to JSON > it has 2 middle names, 2 last names, and 4 addresses > > _id pref_mail_name pref_class_year record_status_code first_name > middle_name > last_name street1 street2 street3 city state_code zipcode > email_address > 0000003934 Kelly A. Draper 1999 A Kelly Ann Draper > 13679 Stoney Springs Dr > Chardon OH 44024-8918 [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly Ann Draper > 1400 McDonald Investment > Ctr 800 Superior Ave E Ste 1400 Cleveland OH > 44114-2617 > [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly Ann Draper > 13156 Aldenshire Dr > Chardon OH 44024-8921 [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly Ann Draper > 100 7th Ave Ste 150 > Chardon OH 44024-7808 [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly Ann Draper > 13765 Equestrian Dr > Burton OH 44021-9552 [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly A. McElroy > 13679 Stoney Springs Dr > Chardon OH 44024-8918 [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly A. McElroy > 1400 McDonald Investment > Ctr 800 Superior Ave E Ste 1400 Cleveland OH > 44114-2617 > [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly A. McElroy > 13156 Aldenshire Dr > Chardon OH 44024-8921 [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly A. McElroy > 100 7th Ave Ste 150 > Chardon OH 44024-7808 [email protected] > 0000003934 Kelly A. Draper 1999 A Kelly A. McElroy > 13765 Equestrian Dr > Burton OH 44021-9552 [email protected] > > after a river run, the indexed doc has 4 addresses, but only one middle > name > and one last name, the other never was indexed > > "_source": { > "pref_mail_name": "Kelly A. Draper", > "street2": [ > " ", > "800 Superior Ave E Ste 1400" > ], > "street1": [ > "13679 Stoney Springs Dr", > "1400 McDonald Investment Ctr", > "13156 Aldenshire Dr", > "100 7th Ave Ste 150", > "13765 Equestrian Dr" > ], > "state_code": "OH", > "middle_name": "A.", > "zipcode": [ > "44024-8918", > "44114-2617", > "44024-8921", > "44024-7808", > "44021-9552" > ], > "pref_class_year": "1999", > "record_status_code": "A", > "city": [ > "Chardon", > "Cleveland", > "Burton" > ], > "first_name": "Kelly", > "last_name": "McElroy", > "street3": " ", > "email_address": "[email protected]" > } > } > > > I have attempted using bracket notation for creating objects, but the same > issue exists, only now the properties are nested > > > my river looks like this > > PUT /_river/matcher/_meta > { > "type" : "jdbc", > "jdbc" : { > "url" : "serverurl", > "user" : "USER", > "password" : "#########", > "sql" : "select e.id_number as \"_id\", e.pref_mail_name as > \"pref_mail_name\", e.pref_class_year as \"pref_class_year\", > e.record_status_code as \"record_status_code\", a.street1 as \"street1\", > a.street2 as \"street2\", a.street3 as \"street3\", a.city as \"city\", > a.state_code as \"state_code\", a.zipcode as \"zipcode\", n.first_name as > \"first_name\", n.middle_name as \"middle_name\", n.last_name as > \"last_name\", email.email_address as \"email_address\" from entity e left > join name n on e.id_number = n.id_number left join email on e.id_number = > email.id_number left join address a on e.id_number = a.id_number where > e.person_or_org = 'P' and e.record_status_code IN ('A', 'L', 'D') ", > "index" : "matcher", > "type" : "entity", > "bulk_size" : 160, > "max_bulk_requests" : 5 > } > } > > let me know if i can provide additional info > > > > > -- > View this message in context: > http://elasticsearch-users.115913.n3.nabble.com/JDBC-river-query-results-collapsing-to-JSON-issue-tp4054562.html > Sent from the ElasticSearch Users mailing list archive at Nabble.com. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/1398177305643-4054562.post%40n3.nabble.com > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHk7-%3Dj%2BQAFPPy%3Dw4%2BiXD%3D%3Dx2BT%2Bao%2BLQQ0DB-hjKiHgw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
