[
https://issues.apache.org/jira/browse/BEAM-7081?focusedWorklogId=230139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230139
]
ASF GitHub Bot logged work on BEAM-7081:
----------------------------------------
Author: ASF GitHub Bot
Created on: 19/Apr/19 14:51
Start Date: 19/Apr/19 14:51
Worklog Time Spent: 10m
Work Description: jbonofre commented on issue #8359: [BEAM-7081]
MongoDbIO: produce correct ranges for splitkeys
URL: https://github.com/apache/beam/pull/8359#issuecomment-484919569
Let me do a first review round, I will let you know. I will check the
failures as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 230139)
Time Spent: 50m (was: 40m)
> MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey
> -----------------------------------------------------------------------------
>
> Key: BEAM-7081
> URL: https://issues.apache.org/jira/browse/BEAM-7081
> Project: Beam
> Issue Type: Bug
> Components: io-java-mongodb
> Reporter: Roman van der Krogt
> Priority: Critical
> Fix For: 2.13.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When there is only a single split key, splitKeysToFilters does not compute
> the correct result. For example, if the split key is "_id: 56", only the
> range filter "_id lower than or equal to 56" is produced. It should also
> include a filter "_id greater than 56". If this happens, the resulting
> PCollection includes only the data until the first split; the remainder is
> not included.
>
> This can be remedied with the following few lines:
>
> {{if (i == 0) {}}
> {{ // this is the first split in the list, the filter defines}}
> {{ // the range from the beginning up to this split}}
> {{ rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
> {{ getFilterString(idType, splitKey));}}
> {{ filters.add(formatFilter(rangeFilter, additionalFilter));}}
> {{{color:#f79232} {color}{color:#14892c}// If there is only one split, also
> generate a range from the split to the end{color}}}
> {{{color:#14892c} if ( splitKeys.size() == 1) {{color}}}
> {{{color:#14892c} rangeFilter = String.format("\{ $and: [
> {\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
> {{{color:#14892c} filters.add(formatFilter(rangeFilter,
> additionalFilter));{color}}}
> {{{color:#14892c} }{color}}}
> {{}}}
>
> The corresponding test case in MongoDbIOTest should be updated to the
> following:
>
> {{@Test}}
> {{public void testSplitIntoFilters() throws Exception {}}
> {{ // A single split will result in two filters}}
> {{ ArrayList<Document> documents = new ArrayList<>();}}
> {{ documents.add(new Document("_id", 56));}}
> {{ List<String> filters =
> MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
> {{ assertEquals(2, filters.size());}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}",
> filters.get(0));}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}",
> filters.get(1));}}
> {{ // Add two more splits; now we should have 4 filters}}
> {{ documents.add(new Document("_id", 109));}}
> {{ documents.add(new Document("_id", 256));}}
> {{ filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents,
> null);}}
> {{ assertEquals(4, filters.size());}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}",
> filters.get(0));}}
> {{ assertEquals("{ $and: [
> {\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
> {{ filters.get(1));}}
> {{ assertEquals("\{ $and: [
> {\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
> {{ filters.get(2));}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}",
> filters.get(3));}}
> {{}}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)