[GitHub] [parquet-mr] rshkv commented on a change in pull request #946: PARQUET-2120: Dictionary command should handle missing dictionary pages

GitBox Sun, 13 Feb 2022 18:32:04 -0800


rshkv commented on a change in pull request #946:
URL: https://github.com/apache/parquet-mr/pull/946#discussion_r805173716




##########
File path: 
parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java
##########
@@ -75,40 +75,12 @@ public int run() throws IOException {
       while ((dictionaryReader = reader.getNextDictionaryReader()) != null) {
         DictionaryPage page = dictionaryReader.readDictionaryPage(descriptor);
 
-        Dictionary dict = page.getEncoding().initDictionary(descriptor, page);
-
-        console.info("\nRow group {} dictionary for \"{}\":", rowGroup, 
column, page.getCompressedSize());
-        for (int i = 0; i <= dict.getMaxId(); i += 1) {
-          switch(type.getPrimitiveTypeName()) {
-            case BINARY:
-              if (type.getLogicalTypeAnnotation() instanceof 
LogicalTypeAnnotation.StringLogicalTypeAnnotation) {
-                console.info("{}: {}", String.format("%6d", i),
-                    
Util.humanReadable(dict.decodeToBinary(i).toStringUsingUTF8(), 70));
-              } else {
-                console.info("{}: {}", String.format("%6d", i),
-                    
Util.humanReadable(dict.decodeToBinary(i).getBytesUnsafe(), 70));
-              }
-              break;
-            case INT32:
-              console.info("{}: {}", String.format("%6d", i),
-                dict.decodeToInt(i));
-              break;
-            case INT64:
-              console.info("{}: {}", String.format("%6d", i),
-                  dict.decodeToLong(i));
-              break;
-            case FLOAT:
-              console.info("{}: {}", String.format("%6d", i),
-                  dict.decodeToFloat(i));
-              break;
-            case DOUBLE:
-              console.info("{}: {}", String.format("%6d", i),
-                  dict.decodeToDouble(i));
-              break;
-            default:
-              throw new IllegalArgumentException(
-                  "Unknown dictionary type: " + type.getPrimitiveTypeName());
-          }
+        if (page != null) {

Review comment:
       This check is the crux of the change.

##########
File path: 
parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java
##########
@@ -75,40 +75,12 @@ public int run() throws IOException {
       while ((dictionaryReader = reader.getNextDictionaryReader()) != null) {
         DictionaryPage page = dictionaryReader.readDictionaryPage(descriptor);
 
-        Dictionary dict = page.getEncoding().initDictionary(descriptor, page);
-
-        console.info("\nRow group {} dictionary for \"{}\":", rowGroup, 
column, page.getCompressedSize());
-        for (int i = 0; i <= dict.getMaxId(); i += 1) {
-          switch(type.getPrimitiveTypeName()) {
-            case BINARY:
-              if (type.getLogicalTypeAnnotation() instanceof 
LogicalTypeAnnotation.StringLogicalTypeAnnotation) {
-                console.info("{}: {}", String.format("%6d", i),
-                    
Util.humanReadable(dict.decodeToBinary(i).toStringUsingUTF8(), 70));
-              } else {
-                console.info("{}: {}", String.format("%6d", i),
-                    
Util.humanReadable(dict.decodeToBinary(i).getBytesUnsafe(), 70));
-              }
-              break;
-            case INT32:
-              console.info("{}: {}", String.format("%6d", i),
-                dict.decodeToInt(i));
-              break;
-            case INT64:
-              console.info("{}: {}", String.format("%6d", i),
-                  dict.decodeToLong(i));
-              break;
-            case FLOAT:
-              console.info("{}: {}", String.format("%6d", i),
-                  dict.decodeToFloat(i));
-              break;
-            case DOUBLE:
-              console.info("{}: {}", String.format("%6d", i),
-                  dict.decodeToDouble(i));
-              break;
-            default:
-              throw new IllegalArgumentException(
-                  "Unknown dictionary type: " + type.getPrimitiveTypeName());
-          }
+        if (page != null) {
+          console.info("\nRow group {} dictionary for \"{}\":", rowGroup, 
column);
+          Dictionary dict = page.getEncoding().initDictionary(descriptor, 
page);
+          printDictionary(dict, type);
+        } else {
+          console.info("\nRow group {} has no dictionary for \"{}\"", 
rowGroup, column);

Review comment:
       For a file mixing pages with and without dictionary encoding the output 
would look e.g. like this:
   ```
   Row group 0 has no dictionary for "col"
   
   Row group 1 dictionary for "col":
        0: "b"
        1: "c"
   ```

##########
File path: 
parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java
##########
@@ -122,6 +94,41 @@ public int run() throws IOException {
     return 0;
   }
 
+  private void printDictionary(Dictionary dict, PrimitiveType type) {

Review comment:
       This is just a copy-paste of the `for` block above.

##########
File path: 
parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java
##########
@@ -75,40 +75,12 @@ public int run() throws IOException {
       while ((dictionaryReader = reader.getNextDictionaryReader()) != null) {
         DictionaryPage page = dictionaryReader.readDictionaryPage(descriptor);
 
-        Dictionary dict = page.getEncoding().initDictionary(descriptor, page);
-
-        console.info("\nRow group {} dictionary for \"{}\":", rowGroup, 
column, page.getCompressedSize());

Review comment:
       I removed the `page.getCompressedSize()` argument here as the log didn't 
have enough placeholders to display it in the first place.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [parquet-mr] rshkv commented on a change in pull request #946: PARQUET-2120: Dictionary command should handle missing dictionary pages

Reply via email to