nsivabalan commented on code in PR #5462:
URL: https://github.com/apache/hudi/pull/5462#discussion_r861390575
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/RowKeyGeneratorHelper.java:
##########
@@ -234,13 +237,14 @@ public static Object getNestedFieldVal(Row row,
List<Integer> positions) {
* @param structType schema of interest
* @param field field of interest for which the positions are
requested for
* @param isRecordKey {@code true} if the field requested for is a record
key. {@code false} in case of a partition path.
- * @return the positions of the field as per the struct type.
+ * @return the positions of the field as per the struct type and the root
fields datatype.
*/
- public static List<Integer> getNestedFieldIndices(StructType structType,
String field, boolean isRecordKey) {
+ public static Pair<List<Integer>, DataType>
getNestedFieldSchemaInfo(StructType structType, String field, boolean
isRecordKey) {
Review Comment:
we don't allow nested fields with row writer code path and with meta fields
disabled. This is the only code path, where we fetch data directly from
InternalRow. If not, we use spark sql Row to fetch data. So, due to these
reasons, in this method, we have certain additional constraints for record keys
which is not applicable to partition path fields.
I am open to discuss if we can relax the constraint for meta field disabling
for nested fields too. but we can take it up in a follow up PR.
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java:
##########
@@ -97,88 +98,69 @@ public String getPartitionPath(Row row) {
@PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
public String getPartitionPath(InternalRow internalRow, StructType
structType) {
try {
- initDeserializer(structType);
- Row row = sparkRowSerDe.deserializeRow(internalRow);
- return getPartitionPath(row);
+ buildFieldSchemaInfoIfNeeded(structType);
+ return
RowKeyGeneratorHelper.getPartitionPathFromInternalRow(internalRow,
getPartitionPathFields(),
+ hiveStylePartitioning, partitionPathSchemaInfo);
} catch (Exception e) {
throw new HoodieIOException("Conversion of InternalRow to Row failed
with exception " + e);
}
}
- private void initDeserializer(StructType structType) {
- if (sparkRowSerDe == null) {
- sparkRowSerDe = HoodieSparkUtils.getDeserializer(structType);
- }
- }
-
- void buildFieldPositionMapIfNeeded(StructType structType) {
+ void buildFieldSchemaInfoIfNeeded(StructType structType) {
if (this.structType == null) {
// parse simple fields
getRecordKeyFields().stream()
.filter(f -> !(f.contains(".")))
.forEach(f -> {
if (structType.getFieldIndex(f).isDefined()) {
- recordKeyPositions.put(f, Collections.singletonList((Integer)
(structType.getFieldIndex(f).get())));
+ int fieldIndex = (int) structType.getFieldIndex(f).get();
Review Comment:
due to how code evolved. we can fix it now.
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowCreateHandle.java:
##########
@@ -107,16 +107,15 @@ public HoodieRowCreateHandle(HoodieTable table,
HoodieWriteConfig writeConfig, S
/**
* Writes an {@link InternalRow} to the underlying
HoodieInternalRowFileWriter. Before writing, value for meta columns are
computed as required
* and wrapped in {@link HoodieInternalRow}. {@link HoodieInternalRow} is
what gets written to HoodieInternalRowFileWriter.
+ *
* @param record instance of {@link InternalRow} that needs to be written to
the fileWriter.
* @throws IOException
*/
public void write(InternalRow record) throws IOException {
try {
- String partitionPath =
record.getUTF8String(HoodieRecord.HOODIE_META_COLUMNS_NAME_TO_POS.get(
- HoodieRecord.PARTITION_PATH_METADATA_FIELD)).toString();
+ String partitionPath = record.getUTF8String(3).toString();
Review Comment:
sorry, do you mean final ? these are local variables right.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##########
@@ -386,12 +386,14 @@ public void validateTableProperties(Properties
properties, WriteOperationType op
throw new HoodieException(HoodieTableConfig.POPULATE_META_FIELDS.key() +
" already disabled for the table. Can't be re-enabled back");
}
- // meta fields can be disabled only with SimpleKeyGenerator
- if (!getTableConfig().populateMetaFields()
- &&
!properties.getProperty(HoodieTableConfig.KEY_GENERATOR_CLASS_NAME.key(),
"org.apache.hudi.keygen.SimpleKeyGenerator")
- .equals("org.apache.hudi.keygen.SimpleKeyGenerator")) {
- throw new HoodieException("Only simple key generator is supported when
meta fields are disabled. KeyGenerator used : "
- +
properties.getProperty(HoodieTableConfig.KEY_GENERATOR_CLASS_NAME.key()));
+ // meta fields can be disabled only with SimpleKeyGenerator,
NonPartitioned and ComplexKeyGen.
+ if (!getTableConfig().populateMetaFields()) {
Review Comment:
https://issues.apache.org/jira/browse/HUDI-3996
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]