[ 
https://issues.apache.org/jira/browse/BEAM-7081?focusedWorklogId=230072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230072
 ]

ASF GitHub Bot logged work on BEAM-7081:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Apr/19 11:34
            Start Date: 19/Apr/19 11:34
    Worklog Time Spent: 10m 
      Work Description: romanvanderkrogt commented on pull request #8359: 
[BEAM-7081] MongoDbIO: produce correct ranges for splitkeys
URL: https://github.com/apache/beam/pull/8359
 
 
   When there is only a single split key, `splitKeysToFilters` does not compute 
the correct result. For example, if the split key is "_id: 56", only the range 
filter "_id lower than or equal to 56" is produced. It should also include a 
filter "_id greater than 56". If this happens, the resulting PCollection 
includes only the data until the first split; the remainder is not included.
   
   While producing the fix for this issue, I noticed that `splitKeysToMatch` 
suffers from the same issue, and additionally, that the last split is also 
produced incorrectly. If the splits are, say, 56, 109, and 256, it produces 
<=56, 56--109 and >= 109, instead of <=56, 56--109, 109--256 and >=256. A fix 
for both issues is included as well.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
    - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)<br>[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
 <br> [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
 
   Portable | --- | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
 | --- | ---
   
   See 
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
 for trigger phrase, status and link of all Jenkins jobs.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 230072)
            Time Spent: 10m
    Remaining Estimate: 0h

> MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-7081
>                 URL: https://issues.apache.org/jira/browse/BEAM-7081
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-mongodb
>            Reporter: Roman van der Krogt
>            Priority: Critical
>             Fix For: 2.13.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When there is only a single split key, splitKeysToFilters does not compute 
> the correct result. For example, if the split key is "_id: 56", only the 
> range filter "_id lower than or equal to 56" is produced. It should also 
> include a filter "_id greater than 56". If this happens, the resulting 
> PCollection includes only the data until the first split; the remainder is 
> not included.
>  
> This can be remedied with the following few lines:
>  
> {{if (i == 0) {}}
> {{  // this is the first split in the list, the filter defines}}
> {{  // the range from the beginning up to this split}}
> {{  rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
> {{  getFilterString(idType, splitKey));}}
> {{  filters.add(formatFilter(rangeFilter, additionalFilter));}}
> {{{color:#f79232}  {color}{color:#14892c}// If there is only one split, also 
> generate a range from the split to the end{color}}}
> {{{color:#14892c}  if ( splitKeys.size() == 1) {{color}}}
> {{{color:#14892c}    rangeFilter = String.format("\{ $and: [ 
> {\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
> {{{color:#14892c}    filters.add(formatFilter(rangeFilter, 
> additionalFilter));{color}}}
> {{{color:#14892c}  }{color}}}
> {{}}}
>  
> The corresponding test case in MongoDbIOTest should be updated to the 
> following:
>  
> {{@Test}}
> {{public void testSplitIntoFilters() throws Exception {}}
> {{  // A single split will result in two filters}}
> {{  ArrayList<Document> documents = new ArrayList<>();}}
> {{  documents.add(new Document("_id", 56));}}
> {{  List<String> filters = 
> MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
> {{  assertEquals(2, filters.size());}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", 
> filters.get(0));}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}", 
> filters.get(1));}}
> {{  // Add two more splits; now we should have 4 filters}}
> {{  documents.add(new Document("_id", 109));}}
> {{  documents.add(new Document("_id", 256));}}
> {{  filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, 
> null);}}
> {{  assertEquals(4, filters.size());}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", 
> filters.get(0));}}
> {{  assertEquals("{ $and: [ 
> {\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
> {{ filters.get(1));}}
> {{  assertEquals("\{ $and: [ 
> {\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
> {{ filters.get(2));}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}", 
> filters.get(3));}}
> {{}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to