[ 
https://issues.apache.org/jira/browse/SOLR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067416#comment-16067416
 ] 

Amit Nithian commented on SOLR-2096:
------------------------------------

Blast from the past - I think we can close this.

> DIH should be able read data directly from HDFS for indexing
> ------------------------------------------------------------
>
>                 Key: SOLR-2096
>                 URL: https://issues.apache.org/jira/browse/SOLR-2096
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4.1
>            Reporter: Amit Nithian
>            Priority: Minor
>             Fix For: 4.9, 6.0
>
>         Attachments: hdfs_reader.tar
>
>
> DIH doesn't support reading from the hdfs:// protocol which makes it hard to 
> index data generated by a M/R job. This tarball contains a subclass of the 
> URLDataSource along with an HDFSReader that allows for this. The data is 
> assumed to be in text format and able to be processed by the 
> LineEntityProcessor.
> Here is an example DIH-Config snippet:
>   <dataSource name="queryData" 
> type="org.apache.solr.handler.dataimport.hdfs.HDFSDataSource" 
>   baseUrl="hdfs://<YOURSERVER>:9000/" encoding="UTF-8" 
>   connectionTimeout="5000" readTimeout="10000"/>
>       <document name="autoSuggester">
>               <entity name="jc" processor="LineEntityProcessor"
>                       url="<YOUR FOLDER>/part*" dataSource="queryData">
> <!-- Field mappings here if necessary -->
>               </entity>
>       </document>
> </dataConfig>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to