[GitHub] [accumulo] keith-turner commented on a change in pull request #2422: Initial implementation of my vision for implementation of Scan Servers.

GitBox Tue, 25 Jan 2022 08:52:31 -0800


keith-turner commented on a change in pull request #2422:
URL: https://github.com/apache/accumulo/pull/2422#discussion_r791032803




##########
File path: 
core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReaderIterator.java
##########
@@ -497,26 +499,44 @@ private void 
doLookups(Map<String,Map<KeyExtent,List<Range>>> binnedRanges,
     for (final String tsLocation : locations) {
 
       final Map<KeyExtent,List<Range>> tabletsRanges = 
binnedRanges.get(tsLocation);
-      if (maxTabletsPerRequest == Integer.MAX_VALUE || tabletsRanges.size() == 
1) {
-        QueryTask queryTask = new QueryTask(tsLocation, tabletsRanges, 
failures, receiver, columns);
-        queryTasks.add(queryTask);
+      if (options.isUseScanServer()) {
+        // Ignore the tablets location and find a scan server to use
+        ScanServerLocator ssl = context.getScanServerLocator();
+        tabletsRanges.forEach((k, v) -> {
+          try {
+            String location = ssl.reserveScanServer(new TabletIdImpl(k));

Review comment:
       >  I figured that if a ScanServer had many threads and performed more 
than one scan at a time, then we would potentially run into the same situation 
we have in in the TabletServer w/r/t memory usage.
   
   I can see the benefit of a single thread.  We could start off implementing 
the thread pool with syncQ with a hard coded thread of size of one and get this 
really fast busy exception behavior.  Later on, as we gain experience, we could 
refine the scan server config and make what causes busy exceptions on a scan 
server configurable.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReaderIterator.java
##########
@@ -497,26 +499,44 @@ private void 
doLookups(Map<String,Map<KeyExtent,List<Range>>> binnedRanges,
     for (final String tsLocation : locations) {
 
       final Map<KeyExtent,List<Range>> tabletsRanges = 
binnedRanges.get(tsLocation);
-      if (maxTabletsPerRequest == Integer.MAX_VALUE || tabletsRanges.size() == 
1) {
-        QueryTask queryTask = new QueryTask(tsLocation, tabletsRanges, 
failures, receiver, columns);
-        queryTasks.add(queryTask);
+      if (options.isUseScanServer()) {
+        // Ignore the tablets location and find a scan server to use
+        ScanServerLocator ssl = context.getScanServerLocator();
+        tabletsRanges.forEach((k, v) -> {
+          try {
+            String location = ssl.reserveScanServer(new TabletIdImpl(k));

Review comment:
       Not sure if it achievable but it would be nice if we could have 
information about scan servers from ZK that is cacheable on the client side (so 
clients do not have to go to ZK frequently), a busy signal from scan servers, 
and the previous two bits of information available to client side plugin that 
makes decisions about which scan servers to use.  If that is workable it could 
avoid any single bottleneck processes in the cluster hopefully allowing this to 
scale up to many thousands of scan servers and thousands of clients.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] keith-turner commented on a change in pull request #2422: Initial implementation of my vision for implementation of Scan Servers.

Reply via email to