[GitHub] drcrallen commented on a change in pull request #5913: Move Caching Cluster Client to java streams and allow parallel intermediate merges

GitBox Thu, 30 Aug 2018 10:26:30 -0700

drcrallen commented on a change in pull request #5913: Move Caching Cluster 
Client to java streams and allow parallel intermediate merges
URL: https://github.com/apache/incubator-druid/pull/5913#discussion_r214114896


 ##########
 File path: server/src/main/java/io/druid/client/CachingClusteredClient.java
 ##########
 @@ -389,169 +461,248 @@ private String computeCurrentEtag(final 
Set<ServerToSegment> segments, @Nullable
       }
     }
 
-    private List<Pair<Interval, byte[]>> pruneSegmentsWithCachedResults(
+    private Pair<ServerToSegment, Optional<byte[]>> lookupInCache(
+        Pair<ServerToSegment, Cache.NamedKey> key,
+        Map<Cache.NamedKey, Optional<byte[]>> cache
+    )
+    {
+      final ServerToSegment segment = key.getLhs();
+      final Cache.NamedKey segmentCacheKey = key.getRhs();
+      final Interval segmentQueryInterval = 
segment.getSegmentDescriptor().getInterval();
+      final Optional<byte[]> cachedValue = Optional
+          .ofNullable(cache.get(segmentCacheKey))
+          // Shouldn't happen in practice, but can screw up unit tests where 
cache state is mutated in crazy
+          // ways when the cache returns null instead of an optional.
+          .orElse(Optional.empty());
+      if (!cachedValue.isPresent()) {
+        // if populating cache, add segment to list of segments to cache if it 
is not cached
+        final String segmentIdentifier = 
segment.getServer().getSegment().getIdentifier();
+        addCachePopulatorKey(segmentCacheKey, segmentIdentifier, 
segmentQueryInterval);
+      }
+      return Pair.of(segment, cachedValue);
+    }
+
+    /**
+     * This materializes the input segment stream in order to let the BulkGet 
stuff in the cache system work
 
 Review comment:
   unordered is fine. It is the one-by-one nature of a `Stream` that causes the 
issue. The bulk calls end up materializing a collection (or array) for a bulk 
call, sometimes referred to in other systems as a bundle.
   
   Transparent bundling would be nicer but beyond the scope of this PR

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] drcrallen commented on a change in pull request #5913: Move Caching Cluster Client to java streams and allow parallel intermediate merges

Reply via email to