[
https://issues.apache.org/jira/browse/SOLR-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028206#comment-14028206
]
Mikhail Khludnev commented on SOLR-4799:
----------------------------------------
There are a plenty of sibling point discussed here, let me keep one more. I
checked one thing with Kettle ETL (Pentaho). the main problem with Kettle is
Eclipse based IDE UI. Giving the DIH replatforming, we expect some Web UI for
DSL editing. I found sibling project
[CDA|http://www.webdetails.pt/ctools/cda.html#cda_editor], which is looking
pretty much like this. Here is the summary:
- the project itself seems modular enough (CBF), hence we can slice some pieces
for using in DIH2.0
- CDA is just a data access - whatever to JSON via HTTP GET
- thus, it lacks of final indexing steps (via POST or xxxSolrServer);
- also, it lacks of long lasting command framework (it's a trivial thread with
interruption and status flags; not a much deal, but nothing for free there)
- it shows pretty cute usage of ETL primitives (and I still think that Kettle
guts are much powerful than Morflines'): it uses xml DSL to configure Kettle
steps and run data export as ETL process.
> SQLEntityProcessor for zipper join
> ----------------------------------
>
> Key: SOLR-4799
> URL: https://issues.apache.org/jira/browse/SOLR-4799
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Reporter: Mikhail Khludnev
> Priority: Minor
> Labels: dih
> Attachments: SOLR-4799.patch
>
>
> DIH is mostly considered as a playground tool, and real usages end up with
> SolrJ. I want to contribute few improvements target DIH performance.
> This one provides performant approach for joining SQL Entities with miserable
> memory at contrast to
> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
> The idea is:
> * parent table is explicitly ordered by it’s PK in SQL
> * children table is explicitly ordered by parent_id FK in SQL
> * children entity processor joins ordered resultsets by ‘zipper’ algorithm.
> Do you think it’s worth to contribute it into DIH?
> cc: [~goksron] [~jdyer]
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]