marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842866645
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
while (true) {
if (currentIterator.hasNext()) {
current = currentIterator.next();
+ Configuration conf = context.getConfiguration();
+ if (HiveIcebergStorageHandler.isDelete(conf,
conf.get(Catalogs.NAME))) {
+ if (current instanceof GenericRecord) {
+ PositionDeleteInfo pdi =
IcebergAcidUtil.parsePositionDeleteInfoFromRecord((GenericRecord) current);
Review Comment:
> Do we need a GenericRecord here?
We need to have a GenericRecord here because it's straightforward to grab
the positional delete info for each record from the GenericRecord. I haven't
looked into how to grab it from a VectorizedRowBatch (or its parquet
equivalent), but seemed more complicated at first glance. As for whether we
need the instanceof check, that's a different question, I guess we don't, since
we already assert that vectorization is off during compilation, so I can remove
that extra check
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
while (true) {
if (currentIterator.hasNext()) {
current = currentIterator.next();
+ Configuration conf = context.getConfiguration();
+ if (HiveIcebergStorageHandler.isDelete(conf,
conf.get(Catalogs.NAME))) {
+ if (current instanceof GenericRecord) {
+ PositionDeleteInfo pdi =
IcebergAcidUtil.parsePositionDeleteInfoFromRecord((GenericRecord) current);
Review Comment:
> Do we need a GenericRecord here?
We need to have a GenericRecord here because it's straightforward to grab
the positional delete info for each record from the GenericRecord. I haven't
looked into how to grab it from a VectorizedRowBatch (or its parquet
equivalent), but seemed more complicated at first glance. As for whether we
need the instanceof check, that's a different question, I guess we don't, since
we already assert that vectorization is off during compilation, so I can remove
that extra check
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]