Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/877#discussion_r127365605
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir,
String childPath) {
.relativize(fullPathWithoutSchemeAndAuthority.toUri()));
if (relativeFilePath.isAbsolute()) {
throw new IllegalStateException(String.format("Path %s is not a
subpath of %s.",
- basePathWithoutSchemeAndAuthority.toUri().toString(),
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+ basePathWithoutSchemeAndAuthority.toUri().getPath(),
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+ }
+ return relativeFilePath.toUri().getPath();
+ }
+ }
+
+ /**
+ * Used to identify metadata version by the deserialization
"metadata_version" first property
+ * from the metadata cache file
+ */
+ public static class MetadataVersion {
+ @JsonProperty("metadata_version")
+ public String textVersion;
+
+ /**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase}
versions
+ */
+ enum Versions {
+ v1(Constants.V1),
+ v2(Constants.V2),
+ v3(Constants.V3),
+ v3_1(Constants.V3_1);
+
+ private final String version;
+
+ Versions(String version) {
+ this.version = version;
+ }
+
+ public String getVersion() {
+ return version;
+ }
+
+ public static Versions fromString(String version) {
+ for (Versions v : Versions.values()) {
+ if (v.version.equalsIgnoreCase(version)) {
+ return v;
+ }
+ }
+ return null;
+ }
+
+ public static class Constants {
+ public static final String V1 = "v1";
+ public static final String V2 = "v2";
+ public static final String V3 = "v3";
+ public static final String V3_1 = "v3_1";
+ }
+ }
+
+ /**
+ * @param fs current file system
+ * @param path of metadata cache file
+ * @return true if metadata version is supported, false otherwise
+ * @throws IOException if parquet metadata can't be deserialized from
the json file
+ */
+ public static boolean isVersionSupported(FileSystem fs, Path path)
throws IOException {
--- End diff --
Here, we can see the reason for the separation. We now open each file
twice: once to check the version, another time to deserialize if the version is
OK. Better to just deserialize the file. There would be two cases.
* Minor change: the current deserializer can read the file. (This is the
case for file version <= code version.) Can also be the case, as here, when the
file version bumps without adding new fields.
* Major change: the deserialization fails with a Jackson exception. This
tells us we cannot read the file because we don't recognize the format. This
should only be the case when file version > code version.
In either case, we can attempt to deserialize the file:
* Deserialize file.
* If error occurs, we don't support the file format.
* If OK, but file version is newer than code version, we don't support.
* If OK, and file version = code version, this is the normal path.
* If OK, but file version < code version, then some special fix-up may be
needed to convert the deserialized data to current format.
Given this, we don't really need a zillion version classes. We have one
class that handles the logic for the current version. We have a deserializer
class that handles the above (including any needed data updates.) And, we have
the serialized data class.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---