[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456285#comment-16456285
 ] 

Cao Manh Dat commented on BEAM-3947:
------------------------------------

After taking a look at the current state, I think we must discuss the goal of 
this issue.

If we just want the pipeline to be able to read from Solr, then the current 
code is fine, it can read data from Solr 5x, Solr 6x and Solr 7x. Because all 
the uses of SolrJ ( the client for Solr cluster ) in Beam right now are very 
basic, ex
 * Parsing data from Zookeeper to know where the Solr nodes live
 * Calling some HTTP APIs that has not changed since then

It seems that we should focus on using new features of Solr 6x and Solr 7x ( we 
may or may not need to update the SolrJ )
 * Support "/export" handler, it will make SolrIO significantly faster since 
all the documents are streamed in one response and the cost of retrieving 
document's fields are much less than current ( column-oriented vs row-oriented )
 * BoundedSolrSource.split can split the source into arbitrary smaller parts.

 

 

> Add support for Solr 6.x/7.x
> ----------------------------
>
>                 Key: BEAM-3947
>                 URL: https://issues.apache.org/jira/browse/BEAM-3947
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-solr
>            Reporter: Ismaël Mejía
>            Assignee: Cao Manh Dat
>            Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to