Roman van der Krogt created BEAM-7081:
-----------------------------------------
Summary: MongoDbIO.splitKeysToFilters returns incorrect filters
with only one splitkey
Key: BEAM-7081
URL: https://issues.apache.org/jira/browse/BEAM-7081
Project: Beam
Issue Type: Bug
Components: io-java-mongodb
Reporter: Roman van der Krogt
When there is only a single split key, splitKeysToFilters does not compute the
correct result. For example, if the split key is "_id: 56", only the range
filter "_id lower than or equal to 56" is produced. It should also include a
filter "_id greater than 56". If this happens, the resulting PCollection
includes only the data until the first split; the remainder is not included.
This can be remedied with the following few lines:
{{if (i == 0) {}}
{{ // this is the first split in the list, the filter defines}}
{{ // the range from the beginning up to this split}}
{{ rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
{{ getFilterString(idType, splitKey));}}
{{ filters.add(formatFilter(rangeFilter, additionalFilter));}}
{{{color:#f79232} {color}{color:#14892c}// If there is only one split, also
generate a range from the split to the end{color}}}
{{{color:#14892c} if ( splitKeys.size() == 1) {{color}}}
{{{color:#14892c} rangeFilter = String.format("\{ $and: [
{\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
{{{color:#14892c} filters.add(formatFilter(rangeFilter,
additionalFilter));{color}}}
{{{color:#14892c} }{color}}}
{{}}}
The corresponding test case in MongoDbIOTest should be updated to the following:
{{@Test}}
{{public void testSplitIntoFilters() throws Exception {}}
{{ // A single split will result in two filters}}
{{ ArrayList<Document> documents = new ArrayList<>();}}
{{ documents.add(new Document("_id", 56));}}
{{ List<String> filters =
MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
{{ assertEquals(2, filters.size());}}
{{ assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}",
filters.get(0));}}
{{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}",
filters.get(1));}}
{{ // Add two more splits; now we should have 4 filters}}
{{ documents.add(new Document("_id", 109));}}
{{ documents.add(new Document("_id", 256));}}
{{ filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents,
null);}}
{{ assertEquals(4, filters.size());}}
{{ assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}",
filters.get(0));}}
{{ assertEquals("{ $and: [
{\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
{{ filters.get(1));}}
{{ assertEquals("\{ $and: [
{\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
{{ filters.get(2));}}
{{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}",
filters.get(3));}}
{{}}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)