[
https://issues.apache.org/jira/browse/BEAM-11457?focusedWorklogId=528117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528117
]
ASF GitHub Bot logged work on BEAM-11457:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 24/Dec/20 12:42
Start Date: 24/Dec/20 12:42
Worklog Time Spent: 10m
Work Description: aromanenko-dev commented on a change in pull request
#13543:
URL: https://github.com/apache/beam/pull/13543#discussion_r548480861
##########
File path:
sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
##########
@@ -475,6 +479,18 @@
return
withValueTranslation(function).toBuilder().setValueCoder(coder).build();
}
+ /**
+ * Determines if key-value clone should be skipped or not (default is
'false'). Hadoop formats
+ * typically work with Writable data structures which are mutable.
Therefore, this IO will clone
+ * read key-values if they are not in the list of well known immutable
types. However, in case
+ * user does use key/value translation functions, resulting key-values
might already be
+ * immutable. In such case, additional copy is unnecessary overhead and
can be avoided by
+ * setting skip to 'true'.
+ */
+ public Read<K, V> withSkipKeyValueClone(boolean value) {
Review comment:
Please, make it configurable separately for keys and values like we do
with other configuration options.
##########
File path:
sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
##########
@@ -475,6 +479,18 @@
return
withValueTranslation(function).toBuilder().setValueCoder(coder).build();
}
+ /**
+ * Determines if key-value clone should be skipped or not (default is
'false'). Hadoop formats
Review comment:
Please, update `HadoopFormatIO`'s Javadoc as well.
##########
File path:
sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
##########
@@ -488,13 +504,16 @@
if (valueCoder == null) {
valueCoder = getDefaultCoder(getValueTypeDescriptor(), coderRegistry);
}
+ boolean skipKeyValueClone = getSkipKeyValueClone() == null ? false :
getSkipKeyValueClone();
Review comment:
Please, set default values in `read()` with a builder.
##########
File path:
sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
##########
@@ -475,6 +479,18 @@
return
withValueTranslation(function).toBuilder().setValueCoder(coder).build();
}
+ /**
+ * Determines if key-value clone should be skipped or not (default is
'false'). Hadoop formats
+ * typically work with Writable data structures which are mutable.
Therefore, this IO will clone
+ * read key-values if they are not in the list of well known immutable
types. However, in case
+ * user does use key/value translation functions, resulting key-values
might already be
Review comment:
Should it be used only with `withKeyTranslation()` or
`withValueTranslation()`, or it can be used independently?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 528117)
Time Spent: 40m (was: 0.5h)
> Enable skip key-value clone for HadoopFormatIO
> -----------------------------------------------
>
> Key: BEAM-11457
> URL: https://issues.apache.org/jira/browse/BEAM-11457
> Project: Beam
> Issue Type: Improvement
> Components: io-java-hadoop-format
> Affects Versions: 2.25.0
> Reporter: Jozef Vilcek
> Assignee: Jozef Vilcek
> Priority: P3
> Time Spent: 40m
> Remaining Estimate: 0h
>
> HadoopFormatIO eagerly clone key-values if they are not a well known
> immutable types. This make sense due to how hadoop Writables behave. However,
> user can use key value translation functions which possibly already output
> immutable types. In such case it would be of benefit if extra clone via coder
> can be avoided.
> It would be great if coder can be consulted on the type an it's need for
> clone. However I am not aware if such detection is possible. I propose to add
> config parameter for skipping the clone which can be used by IO user.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)