keith-turner opened a new issue #967: Prototype adding async get methods to Transaction URL: https://github.com/apache/fluo/issues/967 [Fetching multiple cells](http://fluo.apache.org/tour/multi-get/) in the Fluo tour describes `get` methods that Fluo has to quickly read multiple cells. While using these method is faster than calling `get(row, col)` sequentially, they can be a bit cumbersome. The same thing performance wise could be accomplished with asynchronous `get` methods, however I am not convinced this would be less cumbersome. I have been thinking about this idea for a while, but I have yet to convince myself its a fully baked or good idea. I currently am opening this issue to share my thoughts, not to advocate for this feature. Suppose the following methods were added to SnapshotBase. These method would queue up the get operation in the background and return immediately. ```java CompletableFuture<String> getsAsync(String row, Column column); CompletableFuture<String> getsAsync(String row, Column column, String defaultValue); ``` Using these methods, this [process()]( https://gist.github.com/keith-turner/57e124c715c2542242f11eda85b3128c#file-contentobserver-java-L53) method from a Fluo Tour exercise solution could be written as follows. ```java public void process(TransactionBase tx, String row, Column col) throws Exception { // Use Future here instead of CompletableFuture because its shorter and has // needed get method. This should be much faster than calling three blocking // get methods. Future<String> content = tx.getsAsync(row, CONTENT_COL); Future<String> status = tx.getsAsync(row, REF_STATUS_COL); Future<String> processed = tx.getsAsync(row, PROCESSED_COL, "false"); // Instead of doing status.equals below have to do status.get().equals. Same with // processed. if (status.get().equals("referenced") && processed.get().equals("false")) { adjustCounts(tx, +1, tokenize(content.get())); tx.set(row, PROCESSED_COL, "true"); } if (status.get().equals("unreferenced")) { for (Column c : new Column[] {PROCESSED_COL, CONTENT_COL, REF_COUNT_COL, REF_STATUS_COL}) tx.delete(row, c); if (processed.get().equals("true")) { adjustCounts(tx, -1, tokenize(content.get())); } } } ``` The method above is only one line shorter and I think having to call `status.get()` vs just using `status` is a bit more cumbersome and possibly buggy. For example `status.equals("referenced")` above would probably compile (because `equals` takes `Object`), but it would always return false. The [adjustCounts](https://gist.github.com/keith-turner/57e124c715c2542242f11eda85b3128c#file-contentobserver-java-L35) method from a Tour exercise solution could be rewritten as follows. ```java // This method reads the current counts for the passed in words, adds writes out // the current count plus the delta for each work. private void adjustCounts(TransactionBase tx, int delta, List<String> words) { List<Future<Void>> futures = new ArrayList<>(); for (String word : new HashSet<>(words)) { Future<Void> future = tx.getsAsync("w:" + word, WORD_COUNT, "0") .thenApply(Integer::parseInt) .thenApply(count -> delta + count) .thenAccept(newCount -> { if (newCount == 0) tx.delete("w:" + word, WORD_COUNT); else tx.set("w:" + word, WORD_COUNT, newCount + ""); }); } // wait for all futures to finish for (Future<Void> future : futures) { future.get(); } } ``` Personally I think this method is slightly easier to understand, given an understanding of CompletableFuture. I found CompletableFuture a bit daunting when I first looked at it, but it grew on me. I am interested in seeing more use cases for these proposed get async methods. I am also interested in prototyping them in-order to make it possible to experiment with them.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
