[ 
https://issues.apache.org/jira/browse/NUTCH-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976626#comment-16976626
 ] 

ASF GitHub Bot commented on NUTCH-2739:
---------------------------------------

sebastian-nagel commented on issue #484: NUTCH-2739 : Upgrade ES and migrate to 
REST client
URL: https://github.com/apache/nutch/pull/484#issuecomment-555061506
 
 
   > This will mock the client itself. But we need to mock the server with 
requests and response. So can we go ahead and not do the tests at all?
   
   Well, the previous non-REST test implemented a client which did not send 
anything to the server but just returned a successful response or (if 
`clusterSaturated` was set to true) a temporary failure.
   
   But I'm ok to remove the Test class if it's too much work to rewrite it for 
the REST client.
   
   I've tested the PR but the initial rounds failed for about 50% of the 
pages/documents:
   ```
   [2019-11-18T12:56:46,803][DEBUG][o.e.a.b.TransportShardBulkAction] [vagran] 
[nutch][0] failed to execute bulk item (index) index 
{[nutch][_doc][http://nutch.apache.org/apidocs/apidocs-2.2.1/index.html], 
source[{"{date=Mon Jun 09 15:03:28 CEST 2014, type=[text/html, text, html], 
title=apache-nutch 2.2.1 API, 
url=http://nutch.apache.org/apidocs/apidocs-2.2.1/index.html, 
content=apache-nutch 2.2.1 API\n<H2> Frame Alert</H2> <P> This document is 
designed to be viewed using the frames feature. If you see this message, you 
are using a non-frame-capable web client. <BR> Link to<A 
HREF=\"overview-summary.html\">Non-frame version.</A>\n, search=apache-nutch 
2.2.1 API, tstamp=Thu Jul 26 16:50:11 CEST 2018, segment=20180726164932, 
digest=8b8785f9cec87c0376a7fa940e0e3a6c, host=nutch.apache.org, boost=1.0, 
id=http://nutch.apache.org/apidocs/apidocs-2.2.1/index.html, lastModified=Mon 
Jun 09 15:03:28 CEST 2014}":"doc"}]}
   ```
   
   I got it fixed by using XContentBuilder to pass document as JSON to ES 
client, you'll find the necessary changes in [this 
branch](https://github.com/sebastian-nagel/nutch/tree/NUTCH-2739). Also:
   - updated the description how to upgrade the dependencies in the plugin.xml 
and added few exclusions of dependencies already provided by Nutch core.
   - changed the default properties in index-writers.xml.template so that the 
indexer-elastic plugin works out-of-the-box with default settings
   
   So far, I didn't run any tests at scale. Should be to make sure we are able 
to index millions of documents with the given settings.
   
   Please have a look at my changes. Can you integrate them into your branch?
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> indexer-elastic: Upgrade ES and migrate to REST client
> ------------------------------------------------------
>
>                 Key: NUTCH-2739
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2739
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer, plugin
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.17
>
>
> The indexer-elastic plugin is based on 5.3.0 and should be upgraded to the 
> most recent Elasticsearch version (7.3.0 or upwards).
> [TransportClient|https://www.elastic.co/guide/en/elasticsearch/client/java-api/7.3/transport-client.html]
>  has been deprecated in ES 7.x and will be removed in 8.x. We should migrate 
> to using the [REST 
> client|https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.3/java-rest-high.html]
>  and also check whether this would obsolete the indexer-elastic-rest plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to