[ 
https://issues.apache.org/jira/browse/PIO-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-102:
--------------------------
    External issue URL: 
https://github.com/apache/incubator-predictionio/pull/406
           Description: 
Using the new Elasticsearch 5.x REST storage client as the meta storage source 
(`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in 
conf/pio-env.sh), I found that once an engine has been trained a certain number 
of times, that the most recent engine instance is no longer retrieved. So, I 
tracked down where those Elasticsearch queries originate.

In the original Elasticsearch 1.x storage client, [the "scroll" pagination 
responses are collected by 
*appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44]
 them to one another.

In the new Elasticsearch 5.x client, [the "scroll" responses are collected by 
*prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152]
 them to one another.

This out-of-order concatenation breaks [ESEngineInstances 
`getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192]
 by erroneously replacing the head of the results with an older engine 
instance, when there are enough engine instances to overflow a single page of 
Elasticsearch hits.

I've observed this buggy behavior after ten trainings, when enough engine 
instances are stored to trigger Elasticsearch's scroll feature.

Pull request: https://github.com/apache/incubator-predictionio/pull/406

  was:
Using the new Elasticsearch 5.x REST storage client as the meta storage source 
(`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in 
conf/pio-env.sh), I found that once an engine has been trained a certain number 
of times, that the most recent engine instance is no longer retrieved. So, I 
tracked down where those Elasticsearch queries originate.

In the original Elasticsearch 1.x storage client, [the "scroll" pagination 
responses are collected by 
*appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44]
 them to one another.

In the new Elasticsearch 5.x client, [the "scroll" responses are collected by 
*prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152]
 them to one another.

This out-of-order concatenation breaks [ESEngineInstances 
`getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192]
 by erroneously replacing the head of the results with an older engine 
instance, when there are enough engine instances to overflow a single page of 
Elasticsearch hits.

I've observed this buggy behavior after ten trainings, when enough engine 
instances are stored to trigger Elasticsearch's scroll feature.

I'll be opening a pull request shortly with the super-simple fix.


> ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)
> -------------------------------------------------------------------
>
>                 Key: PIO-102
>                 URL: https://issues.apache.org/jira/browse/PIO-102
>             Project: PredictionIO
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.11.0-incubating
>            Reporter: Mars Hall
>            Assignee: Mars Hall
>
> Using the new Elasticsearch 5.x REST storage client as the meta storage 
> source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in 
> conf/pio-env.sh), I found that once an engine has been trained a certain 
> number of times, that the most recent engine instance is no longer retrieved. 
> So, I tracked down where those Elasticsearch queries originate.
> In the original Elasticsearch 1.x storage client, [the "scroll" pagination 
> responses are collected by 
> *appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44]
>  them to one another.
> In the new Elasticsearch 5.x client, [the "scroll" responses are collected by 
> *prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152]
>  them to one another.
> This out-of-order concatenation breaks [ESEngineInstances 
> `getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192]
>  by erroneously replacing the head of the results with an older engine 
> instance, when there are enough engine instances to overflow a single page of 
> Elasticsearch hits.
> I've observed this buggy behavior after ten trainings, when enough engine 
> instances are stored to trigger Elasticsearch's scroll feature.
> Pull request: https://github.com/apache/incubator-predictionio/pull/406



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to