imply-cheddar commented on code in PR #15148:
URL: https://github.com/apache/druid/pull/15148#discussion_r1361687034
##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
@@ -52,10 +52,14 @@ public class ParquetGroupConverter
private static final long NANOS_PER_MILLISECOND =
TimeUnit.MILLISECONDS.toNanos(1);
/**
- * See {@link ParquetGroupConverter#convertField(Group, String)}
+ * Convert a parquet group field as though it were a map. Logical types of
'list' and 'map' will be transformed
+ * into java lists and maps respectively ({@link
ParquetGroupConverter#convertLogicalList} and
+ * {@link ParquetGroupConverter#convertLogicalMap}), repeated fields will
also be translated to lists, and
+ * primitive types will be extracted into an ingestion friendly state (e.g.
'int' and 'long'). Finally,
+ * if a field is not present, this method will return null.
*/
@Nullable
- private static Object convertField(Group g, String fieldName, boolean
binaryAsString)
Review Comment:
Just double checking, this `binaryAsString` argument. That was being passed
around but never actually used?
Any changes you want to make to `ParquetToJson` are good, that code is just
for our own dev purposes, but these changes in `ParquetGroupConverter` is
changing the code that our current parquet-based file ingestions end up using.
So, I just want to double check that this cleanup is truly a redundant argument
and not just something that wasn't needed for `ParquetToJson` but used for the
other production code paths.
##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
Review Comment:
This is a design nit: with the change to remove the `static` label from the
methods, I would expect them to start coming after the constructor. That is,
we tend to follow the code flow of static first, then constructor then class
methods. I noticed that it was inverted because I kept searching for the
constructor at the top of the file and didn't see it, and then realized it was
at the bottom and that's because the methods used to be static but now are not.
The current structure reads really nicely for the diff though, I hope that
the whitespace change that I'm nit picking doesn't screw up the diff...
##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
@@ -492,10 +510,12 @@ private static BigDecimal convertBinaryToDecimal(Binary
value, int precision, in
}
private final boolean binaryAsString;
+ private final boolean convertCorruptDates;
- public ParquetGroupConverter(boolean binaryAsString)
+ public ParquetGroupConverter(boolean binaryAsString, boolean
convertCorruptDates)
{
this.binaryAsString = binaryAsString;
Review Comment:
is `binaryAsString` still used somewhere?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]