Github user ejwhite922 commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/153#discussion_r133803255 --- Diff: extras/indexing/src/main/java/org/apache/rya/indexing/entity/storage/mongo/MongoEntityStorage.java --- @@ -242,4 +281,46 @@ private static Bson makeExplicitTypeFilter(final RyaURI typeId) { return Stream.of(dataTypeFilter, valueFilter); } + + private boolean detectDuplicates(final Entity entity) throws EntityStorageException { + boolean hasDuplicate = false; + if (duplicateDataDetector.isDetectionEnabled()) { + if (mongoTypeStorage == null) { + mongoTypeStorage = new MongoTypeStorage(mongo, ryaInstanceName); + } + final Builder builder = new Builder(); + builder.setSubject(entity.getSubject()); + boolean abort = false; + for (final RyaURI typeRyaUri : entity.getExplicitTypeIds()) { + Optional<Type> type; + try { + type = mongoTypeStorage.get(typeRyaUri); + } catch (final TypeStorageException e) { + throw new EntityStorageException("Unable to get entity type: " + typeRyaUri, e); + } + if (type.isPresent()) { + final ConvertingCursor<TypedEntity> cursor = search(Optional.empty(), type.get(), Collections.emptySet()); + while (cursor.hasNext()) { --- End diff -- Oops, it's only grabbing one Entity to compare. I reworked so it now finds a set of potential Entities to compare based on them having all the same explicit type IDs. The subjects don't matter and querying for properties doesn't help us since we're trying to find properties that are CLOSE but not quite equal. That leaves us with only the Types to narrow our initial search of Entities to check. Once we grab the Entities the (near) duplicate data detector is run over them.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---