rdblue commented on a change in pull request #3049:
URL: https://github.com/apache/iceberg/pull/3049#discussion_r712292310



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotParser.java
##########
@@ -146,19 +151,31 @@ static Snapshot fromJson(FileIO io, JsonNode node) {
     }
 
     Integer schemaId = JsonUtil.getIntOrNull(SCHEMA_ID, node);
+    String keyMetadataLocation = 
JsonUtil.getStringOrNull(KEY_METADATA_LOCATION, node);
 
     if (node.has(MANIFEST_LIST)) {
       // the manifest list is stored in a manifest list file
       String manifestList = JsonUtil.getString(MANIFEST_LIST, node);
-      return new BaseSnapshot(
-          io, sequenceNumber, snapshotId, parentId, timestamp, operation, 
summary, schemaId, manifestList);
+      return new BaseSnapshot(io, sequenceNumber, snapshotId, parentId, 
timestamp, operation, summary, schemaId,
+          manifestList, keyMetadataLocation);
 
     } else {
       // fall back to an embedded manifest list. pass in the manifest's 
InputFile so length can be
       // loaded lazily, if it is needed
       List<ManifestFile> manifests = 
Lists.transform(JsonUtil.getStringList(MANIFESTS, node),
           location -> new GenericManifestFile(io.newInputFile(location), 0));
-      return new BaseSnapshot(io, snapshotId, parentId, timestamp, operation, 
summary, schemaId, manifests);
+
+      ByteBuffer keyMetadata = null;
+      if (keyMetadataLocation != null) {
+        try (InputStream stream = 
io.newInputFile(keyMetadataLocation).newStream()) {
+          keyMetadata = ByteBuffer.wrap(ByteStreams.toByteArray(stream));
+        } catch (IOException e) {
+          throw new UncheckedIOException("Failed to read key metadata at 
location " + keyMetadataLocation, e);
+        }
+      }

Review comment:
       I agree, this seems like something we could put in an IO utility.
   
   Also, do we expect key metadata to be reused across snapshots? If so, then 
we should add some indirection to cache key metadata by location.
   
   Another thing I've thought about is caching parsed `Snapshot` instances. 
Currently, each time we load a table we will reload all of the snapshots. But 
we could save some time by returning an existing `Snapshot` instance from a 
previously loaded JSON file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to