Roman van der Krogt created BEAM-7081:
-----------------------------------------

             Summary: MongoDbIO.splitKeysToFilters returns incorrect filters 
with only one splitkey
                 Key: BEAM-7081
                 URL: https://issues.apache.org/jira/browse/BEAM-7081
             Project: Beam
          Issue Type: Bug
          Components: io-java-mongodb
            Reporter: Roman van der Krogt


When there is only a single split key, splitKeysToFilters does not compute the 
correct result. For example, if the split key is "_id: 56", only the range 
filter "_id lower than or equal to 56" is produced. It should also include a 
filter "_id greater than 56". If this happens, the resulting PCollection 
includes only the data until the first split; the remainder is not included.

 

This can be remedied with the following few lines:

 

{{if (i == 0) {}}
{{  // this is the first split in the list, the filter defines}}
{{  // the range from the beginning up to this split}}
{{  rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
{{  getFilterString(idType, splitKey));}}
{{  filters.add(formatFilter(rangeFilter, additionalFilter));}}
{{{color:#f79232}  {color}{color:#14892c}// If there is only one split, also 
generate a range from the split to the end{color}}}
{{{color:#14892c}  if ( splitKeys.size() == 1) {{color}}}
{{{color:#14892c}    rangeFilter = String.format("\{ $and: [ 
{\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
{{{color:#14892c}    filters.add(formatFilter(rangeFilter, 
additionalFilter));{color}}}
{{{color:#14892c}  }{color}}}
{{}}}

 

The corresponding test case in MongoDbIOTest should be updated to the following:

 

{{@Test}}
{{public void testSplitIntoFilters() throws Exception {}}
{{  // A single split will result in two filters}}
{{  ArrayList<Document> documents = new ArrayList<>();}}
{{  documents.add(new Document("_id", 56));}}
{{  List<String> filters = 
MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
{{  assertEquals(2, filters.size());}}
{{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", 
filters.get(0));}}
{{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}", 
filters.get(1));}}

{{  // Add two more splits; now we should have 4 filters}}
{{  documents.add(new Document("_id", 109));}}
{{  documents.add(new Document("_id", 256));}}
{{  filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, 
null);}}
{{  assertEquals(4, filters.size());}}
{{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", 
filters.get(0));}}
{{  assertEquals("{ $and: [ 
{\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
{{ filters.get(1));}}
{{  assertEquals("\{ $and: [ 
{\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
{{ filters.get(2));}}
{{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}", 
filters.get(3));}}
{{}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to