arjansh commented on a change in pull request #229: Upgrade to Elasticsearch 7.3.1 URL: https://github.com/apache/metamodel/pull/229#discussion_r336520030
########## File path: elasticsearch/rest/src/main/java/org/apache/metamodel/elasticsearch/rest/ElasticSearchRestDataSet.java ########## @@ -50,12 +50,15 @@ public ElasticSearchRestDataSet(final RestHighLevelClient client, final SearchRe @Override public void closeNow() { final String scrollId = _searchResponse.getScrollId(); - final ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); - clearScrollRequest.addScrollId(scrollId); - try { - client.clearScroll((ClearScrollRequest) clearScrollRequest, RequestOptions.DEFAULT); - } catch (IOException e) { - logger.warn("Could not clear scroll.", e); + + if (scrollId != null) { + final ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); + clearScrollRequest.addScrollId(scrollId); + try { + client.clearScroll((ClearScrollRequest) clearScrollRequest, RequestOptions.DEFAULT); + } catch (final IOException e) { + logger.warn("Could not clear scroll.", e); + } Review comment: It lives on the server and yes, it is a relatively expensive operation. Typically the scroll id is used when iterating over larger sets of data, so you do a query and get a thousand results, out of which only 400 are returned by the first HTTP request, then you use the scroll id to get the second 400, and you clear the scroll id, once you're done with your `DataSet`. This is a typical use case for which you use the scrolling mechanism and you should not have to clear it often. What we do in MetaModel however is that when updating data using the `DeleteAndInsertBuilder` it first gets all rows to update (resulting in a scroll id, which is also retained on the Elasticsearch cluster), then it deletes all these rows, then it updates them locally (that is the Row objects) and then it inserts them into the actual table. If you update data by quickly in succession calling MetaModel's update mechanism using the `DeleteAndInsertBuilder` it will quickly create a lot of "scrolls" of which MetaModel only allows 500. Within the `DeleteAndInsertBuilder` we however don't know on beforehand if the update query which is fired will only update one (or just a few) records (in case we could opt not to use the scrolling mechanism at all, because of the overhead it introduces) or if it will update thousands of rows. So if we want to optimize that, then we should probably come up with something which I think is out of scope for this PR. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services