[
https://issues.apache.org/jira/browse/GOBBLIN-1658?focusedWorklogId=779724&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779724
]
ASF GitHub Bot logged work on GOBBLIN-1658:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 09/Jun/22 00:41
Start Date: 09/Jun/22 00:41
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #3518:
URL: https://github.com/apache/gobblin/pull/3518#discussion_r892975231
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java:
##########
@@ -39,34 +39,25 @@
@Slf4j
/**
* As of now, {@link HiveDatasetDescriptor} has same implementation as that of
{@link SqlDatasetDescriptor}.
- * Fields {@link HiveDatasetDescriptor#isPartitioned}, {@link
HiveDatasetDescriptor#partitionColumn} and
- * {@link HiveDatasetDescriptor#partitionFormat} are used for methods 'equals'
and 'hashCode'.
+ * Fields {@link HiveDatasetDescriptor#partitionColumn} and {@link
HiveDatasetDescriptor#partitionFormat}
+ * are used for methods 'equals' and 'hashCode'.
*/
@EqualsAndHashCode (exclude = {"whitelistBlacklist"}, callSuper = true)
public class HiveDatasetDescriptor extends SqlDatasetDescriptor {
- static final String IS_PARTITIONED_KEY = "isPartitioned";
static final String PARTITION_COLUMN = "partition.column";
static final String PARTITION_FORMAT = "partition.format";
static final String CONFLICT_POLICY = "conflict.policy";
- private final boolean isPartitioned;
private final String partitionColumn;
private final String partitionFormat;
private final String conflictPolicy;
WhitelistBlacklist whitelistBlacklist;
public HiveDatasetDescriptor(Config config) throws IOException {
super(config);
- this.isPartitioned = ConfigUtils.getBoolean(config, IS_PARTITIONED_KEY,
true);
- if (isPartitioned) {
- partitionColumn = ConfigUtils.getString(config, PARTITION_COLUMN,
DatePartitionHiveVersionFinder.DEFAULT_PARTITION_KEY_NAME);
- partitionFormat = ConfigUtils.getString(config, PARTITION_FORMAT,
DatePartitionHiveVersionFinder.DEFAULT_PARTITION_VALUE_DATE_TIME_PATTERN);
- conflictPolicy =
HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_PARTITIONS.name();
- } else {
- partitionColumn = "";
- partitionFormat = "";
- conflictPolicy =
HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
- }
+ partitionColumn = ConfigUtils.getString(config, PARTITION_COLUMN,
DatePartitionHiveVersionFinder.DEFAULT_PARTITION_KEY_NAME);
+ partitionFormat = ConfigUtils.getString(config, PARTITION_FORMAT,
DatePartitionHiveVersionFinder.DEFAULT_PARTITION_VALUE_DATE_TIME_PATTERN);
+ conflictPolicy =
HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE_AND_PARTITIONS.name();
Review Comment:
the values I see for these defaults are `"datepartition"`,
`"yyyy-MM-dd-HH"`, so not backwards compatible w/ the `""` and `""` previously
when `!isPartitioned`.
could that cause an issue? e.g. when the user doesn't use partitioning and
therefore gave nothing, so we fell back to `"datepartition"`, how would we
discern from when they pass that value explicitly with the intent to use
partitioning?
Issue Time Tracking
-------------------
Worklog Id: (was: 779724)
Time Spent: 0.5h (was: 20m)
> Remove isPartitioned reference from HiveDatasetDescriptor
> ---------------------------------------------------------
>
> Key: GOBBLIN-1658
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1658
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-core
> Reporter: Urmi Mustafi
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Remove extra parameter from user side differentiating snapshot and
> partitioned copy. User shouldn't have to provide this flag. We can generalize
> the default values of partition format, column, and conflict policy without
> knowing this.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)