[
https://issues.apache.org/jira/browse/SOLR-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908084#comment-13908084
]
Manjunath commented on SOLR-2943:
---------------------------------
Nice feature to have :-). Waiting to use it :-)
> DIHCacheWriter & DIHCacheProcessor (entity processor)
> -----------------------------------------------------
>
> Key: SOLR-2943
> URL: https://issues.apache.org/jira/browse/SOLR-2943
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 4.0-ALPHA
> Reporter: James Dyer
> Priority: Minor
> Fix For: 4.7
>
> Attachments: SOLR-2943.patch, SOLR-2943.patch, SOLR-2943.patch
>
>
> This is a spin-off of SOLR-2382.
> Currently DIH requires users to retrieve, join and index all data for a full
> or delta update in one big step. This issue is to allow us to break this
> into individual steps. The idea is to have multiple "data-config.xml" files,
> some of which retrieve and cache data while others join and index data.
> This is useful when Solr Records are a conglomeration of several data
> sources. With this feature, each data source can be retrieved and cached
> separately. Once all data sources have been retrieved, they can be joined
> and indexed in a final step. When doing a delta update, only the data
> sources that change need to have their caches updated (or frequently-changing
> data can remain un-cached while caching the more static data). This is
> particularly useful in light of the fact that Lucene/Solr cannot do a true
> "update" operation. DIH Caches also provide a handy way to archive source
> data for which there is no stable system-of-record.
> Implementation Details:
> - The DIHCacheWriter allows us to write the final (root entity) DIH output to
> a DIHCache rather than to Solr. Caches can be created from scratch
> ("full-update") or existing caches can be modified ("delta-update").
> - The DIHCacheProcessor is an Entity Processor that reads a DIHCache. This
> Entity Processor can be used for both Root Entities and Child Entities.
> Cached data can be read back, joined to other Entities and indexed.
> - Both DIHCacheWriter and DIHCacheProcessor support partitioning.
> DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can
> read back a particular partition. This can be handy when indexing to
> multiple shards.
> - This patch is 100% stand-alone from the rest of DIH, so while users can
> patch and rebuild the DIH .jar file to include these classes, it is
> unnecessary. To use this functionality, simply include the code here in the
> classpath. (ex: in SOLR_HOME/lib)
> - In addition to this patch, a persistent cache implementation is required.
> - See SOLR-2948 for a DIH Cache Implementation built on Lucene (no
> additional dependencies).
> - See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE (we use
> this in Production).
> - Other Cache Implementations (hopefully) will be developed in the future
> and become available for general use.
> - This patch includes extensive unit tests. A MockDIHCache that supports
> persistence and delta updates facilitates the tests. Do not attempt to use
> MockDIHCache for anything other than testing or as a reference for developing
> your own DIHCache implementations.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]