[ 
https://issues.apache.org/jira/browse/SOLR-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-4799:
-----------------------------------

    Attachment: SOLR-4085.patch

Attaching the first drop. 

I don't say I share your idea [~jdyer] about adding zipper ability across all 
processor, anyway let's check how it would be.

Implementation itself is not a big deal 'cause it's based on guava, it's 
enabled by join='zipper' . Note: it doesn't support case of People *-> Country, 
but only classic People -*> Sports. though oneliner  covers that. 

I extracted DIHSupport constructor, which parses attrs into Relation class. I 
introduced Zipper as EP internal strategy like DIHCacheSupport. It seems all 
these stuff should be extracted as few proper strategies at future.

derby test covers only sports, not countries. They can be also covered, but not 
both. Joining both sides by zipper will make test super puzzling. So, it needs 
to be addressed later. 

The most thing which I worry about is the test data. From what I see, we have 
only vanilla data: for every people we have few or single sports. Zipper 
caveats are orphaned sports and sportless peoples. if there is a bug in zipper 
it can mess following entities. btw, giving my experience obtained in DIH vs 
Threads battle, I can say it menaces to caching implementations also. Ideally, 
I'd like to pause this one, improve derby test for orphaned children and 
childless parents and continue with zipper afterwards. 

Please let me know what you think!  
                
> SQLEntityProcessor for zipper join
> ----------------------------------
>
>                 Key: SOLR-4799
>                 URL: https://issues.apache.org/jira/browse/SOLR-4799
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>              Labels: dih
>         Attachments: SOLR-4085.patch
>
>
> DIH is mostly considered as a playground tool, and real usages end up with 
> SolrJ. I want to contribute few improvements target DIH performance.
> This one provides performant approach for joining SQL Entities with miserable 
> memory at contrast to 
> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor  
> The idea is:
> * parent table is explicitly ordered by it’s PK in SQL
> * children table is explicitly ordered by parent_id FK in SQL
> * children entity processor joins ordered resultsets by ‘zipper’ algorithm.
> Do you think it’s worth to contribute it into DIH?
> cc: [~goksron] [~jdyer]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to