dlmarion commented on code in PR #5982:
URL: https://github.com/apache/accumulo/pull/5982#discussion_r2557729570
##########
core/src/main/java/org/apache/accumulo/core/data/LoadPlan.java:
##########
@@ -510,4 +524,41 @@ public static LoadPlan compute(URI file,
Map<String,String> properties,
return builder.build();
}
}
+
+ /**
+ * Computes a load plan for a rfile based on the minimum and maximum row
present across all
+ * locality groups.
+ *
+ * @param properties used when opening the rfile, see
+ * {@link
org.apache.accumulo.core.client.rfile.RFile.ScannerOptions#withTableProperties(Map)}
+ *
+ * @return a load plan of type {@link RangeType#FILE}
+ * @since 2.1.5
+ */
+ public static LoadPlan compute(URI file, Map<String,String> properties)
throws IOException {
+ var path = new Path(file);
+ var conf = new Configuration();
+ var fs = FileSystem.get(path.toUri(), conf);
+ CryptoService cs =
+ CryptoFactoryLoader.getServiceForClient(CryptoEnvironment.Scope.TABLE,
properties);
+ CachableBlockFile.CachableBuilder cb =
+ new CachableBlockFile.CachableBuilder().fsPath(fs,
path).conf(conf).cryptoService(cs);
+ try (var reader = new
org.apache.accumulo.core.file.rfile.RFile.Reader(cb)) {
Review Comment:
Is there a reason not to use FileOperations.ReaderBuilder?
##########
core/src/main/java/org/apache/accumulo/core/data/LoadPlan.java:
##########
@@ -90,13 +96,19 @@ public enum RangeType {
* row and end row can be null. The start row is exclusive and the end row
is inclusive (like
* Accumulo tablets). A common use case for this would be when files were
partitioned using a
* table's splits. When using this range type, the start and end row must
exist as splits in the
- * table or an exception will be thrown at load time.
+ * table or an exception will be thrown at load time. This RangeType is
the most efficient for
+ * accumulo to load, and it enables only loading files to tablets that
overlap data in the file.
*/
TABLE,
/**
- * Range that correspond to known rows in a file. For this range type, the
start row and end row
- * must be non-null. The start row and end row are both considered
inclusive. At load time,
- * these data ranges will be mapped to table ranges.
+ * Range that corresponds to the minimum and maximum rows in a file. For
this range type, the
+ * start row and end row must be non-null. The start row and end row are
both considered
+ * inclusive. At load time, these data ranges will be mapped to table
ranges. For this RangeType
+ * accumulo has to do more work at load to map the file range to tablets.
Also, this will map a
Review Comment:
```suggestion
* Accumulo has to do more work at load to map the file range to
tablets. Also, this will map a
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]