This is an automated email from the ASF dual-hosted git repository.

abti pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/gobblin.git


The following commit(s) were added to refs/heads/master by this push:
     new fc508ca27 Update CHANGELOG to reflect changes in 0.17.0
fc508ca27 is described below

commit fc508ca272cc2ea0a9f5cdd28e72f60ce3c7912b
Author: Abhishek Tiwari <[email protected]>
AuthorDate: Tue Jun 13 22:04:52 2023 -0700

    Update CHANGELOG to reflect changes in 0.17.0
---
 CHANGELOG.md | 237 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 237 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 418cfb1f1..51abe3e66 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,240 @@
+GOBBLIN 0.17.0
+--------------
+
+### Created Date: 06/13/2023
+
+* [GOBBLIN-1836] Ensure Task Reliability: Handle Job Cancellation and Graceful 
Exits for Error-Free Completion 
+* [GOBBLIN-1805] Support watermark for the most recent hour for quiet topics 
+* [GOBBLIN-1833] Emit Completeness watermark information in 
SnapshotCommitEvent 
+* [GOBBLIN-1832] Emit warning instead of failing job in retention
+* [GOBBLIN-1831] Use flowexecutionid in kafka monitor and jobnames 
+* [GOBBLIN-1830] Improved Container Transition Tracking in Streaming Data 
Ingestion 
+* [GOBBLIN-1823] Improved Container Calculation and Allocation Methodology 
+* [GOBBLIN-1829] Fixed bug where the wrong workunit event was being tracked  
+* [GOBBLIN-1828] Implement Timeout for Creating Writer Functionality 
+* [GOBBLIN-1827] Added check that if nested field is optional and has a 
non-null default 
+* [GOBBLIN-1826] Changed isAssignableFrom() to isSuperTypeOf() per Guava 20 
javadocs  
+* [GOBBLIN-1825] Fail Hive retention job if deleting underlying files fail 
+* [GOBBLIN-1824] Improved the Efficiency of Work Planning in Manifest-Based 
DistCp Jobs 
+* [GOBBLIN-1822] Logging for Abnormal Helix Task States 
+* [GOBBLIN-1821] Allow flow execution ID propagate to the Job ID if it exists 
+* [GOBBLIN-1820] Added null default value to observability events 
+* [GOBBLIN-1819] Log helix workflow information and timeout information during 
submission wait / polling 
+* [GOBBLIN-1810] Support general Iceberg catalog (support configurable 
behavior for metadata retention policy) 
+* [GOBBLIN-1818] Initilaize yarn clients in yarn app launcher so that a child 
class can override the yarn client creation logic 
+* [GOBBLIN-1817] Changed some deprecated code and fix minor codestyle 
+* [GOBBLIN-1813] Helix workflows submission timeouts made configurable 
+* [GOBBLIN-1816] Added job properties and GaaS instance ID to observability 
event 
+* [GOBBLIN-1814] Added MRJobLauncher configurability for any failing mapper to 
be fatal to the MR job 
+* [GOBBLIN-1811] Fix Iceberg Registration Serialization 
+* [GOBBLIN-1810] Support general Iceberg catalog in IcebergMetadataWriter 
+* [GOBBLIN-1815] Refactor yarn app launchers to support extending these 
classes 
+* [GOBBLIN-1809] Add new lookback version finder for use with iceberg 
retention 
+* [GOBBLIN-1808] Bump Guava version from 15.0 to 20.0 
+* [GOBBLIN-1807] Replaced conjars.org with conjars.wensel.net 
+* [GOBBLIN-1806] Submit dataset summary event post commit and integrate them 
into GaaSObservabilityEvent 
+* [GOBBLIN-1805] Check watermark for the most recent hour for quiet topics 
+* [GOBBLIN-1804] Merge similar logic between 
FlowConfig{,V2}ResourceLocalHandler.update into single base class impl. 
+* [GOBBLIN-1804] Reject flow config updates that would fail compilation by 
returning service error 
+* [GOBBLIN-1802] Register iceberg table metadata update with destination side 
catalog 
+* [GOBBLIN-1799] Fix add spec and actual number flows scheduled metrics 
+* [GOBBLIN-1798] Add backoff retry when we access mysql db for flow spec or 
dag action 
+* [GOBBLIN-1797] Skip scheduling flows far into future 
+* [GOBBLIN-1796] Log startup command when container fails to startup 
+* [GOBBLIN-1795] Make Manifest based copy to support facl 
+* [GOBBLIN-1794] Add defaults to newly added fields in observability events 
+* [GOBBLIN-1793] Add metrics to measure and isolate bottleneck for init
+* [GOBBLIN-1792] Upgrade Mockito to 4.* 
+* [GOBBLIN-1791] Prevent the adding of flowspec compilation errors to the 
scheduler  
+* [GOBBLIN-1790] Add and change appropriate job status fields for 
observability events 
+* [GOBBLIN-1779] Ability to filter datasets that contain non optional unions 
+* [GOBBLIN-1789] Create Generic Iceberg Data Node to Support Different Types 
of Catalogs 
+* [GOBBLIN-1787] Ability to delete multiple watermarks in a state store 
+* [GOBBLIN-1786] Support Other Catalog Types for Iceberg Distcp 
+* [GOBBLIN-1785] Add MR_JARS_BASE_DIR and logic to delete old mr jar dirs 
+* [GOBBLIN-1784] Only clean dags from the dag manager if a flow event is 
received 
+* [GOBBLIN-1783] Initialize scheduler with batch gets instead of individual 
get per flow 
+* [GOBBLIN-1782] Fix Merge State for Flow Pending Resume statuses 
+* [GOBBLIN-1781] Make Helix offline instance purging thread safe in the yarn 
service 
+* [GOBBLIN-1780] Refactor/rename YarnServiceIT to YarnServiceTest 
+* [GOBBLIN-1773] Fix bugs in quota manager 
+* [GOBBLIN-1778] Add house keeping thread in DagManager to periodically sync 
in memory state with mysql table 
+* [GOBBLIN-1777] Register gauge metrics for change monitors 
+* [GOBBLIN-1775] Make GMIP Hive metadatawriter gracefully fail 
+* [GOBBLIN-1774] Util for detecting non optional uniontype columns based on 
Hive Table metadata 
+* [GOBBLIN-1771] Clean up logs for dataset commit and file cleanup 
+* [GOBBLIN-1770] Allow null values for fields in GaaSObservabilityEvent.Issue 
fields which are optional 
+* [GOBBLIN-1769] Change a noisy log that indicates that the queue capacity is 
almost full
+* [GOBBLIN-1768] Fix constructor in KafkaJobStatusMonitorFactory so that it 
can be injected
+* [GOBBLIN-1767] Update references to deprecated Mysql connector/j driver to 
new name 
+* [GOBBLIN-1766] Define metric to measure lag from producer to consumer
+* [GOBBLIN-1765] Add support to sync metadata for dir in manifest based copy 
+* [GOBBLIN-1764] Emit observability event 
+* [GOBBLIN-1763] D2 markup/down for all live GaaS services not only leader 
+* [GOBBLIN-1762] Upgrade Gobblin OSS Hadoop version to 2.10.0 
+* [GOBBLIN-1761] Update Gobblin OSS Slack channel link to a never-expire link 
+* [GOBBLIN-1758] Disable flaky HiveMaterializerTest on CI/CD 
+* [GOBBLIN-1757] Refactor manifest, add reader/writer and iterator for 
efficient reading 
+* [GOBBLIN-1756] Fix the issue that causes skipping flows for multihop jobs 
+* [GOBBLIN-1755] Support extended ACLs and sticky bit for file based distcp 
+* [GOBBLIN-1754] Fixes for mysql store change monitors 
+* [GOBBLIN-1759] Add error reporting when attempting to resolve flow configs 
+* [GOBBLIN-1753] Migrate DB connection pool from o.a.commons.dbcp/dbcp2 to 
HikariCP 
+* [GOBBLIN-1752] Fix race condition where FSTemplateCatalog would update at 
the same 
+* [GOBBLIN-1750] Add schemas for observability events in GaaS 
+* [GOBBLIN-1749] Add dependency for handling xz-compressed Avro file 
+* [GOBBLIN-1748] Add logs to debug multi-hop flows creation, progression, and 
cleanup 
+* [GOBBLIN-1747] Add job.name and job.id to kafka and compaction workunits 
+* [GOBBLIN-1746] Add fs.uri to FsDatasetDescriptor to support copy between 
volumes in GaaS 
+* [GOBBLIN-1745] Fix bug in SimpleKafkaSpecProducer 
+* [GOBBLIN-1744] Improve handling of null value edge cases when querying Helix 
+* [GOBBLIN-1743] Ensure GobblinTaskRunner works without Yarn use 
+* [GOBBLIN-1742] Do not close DestinationDatasetHandlerService prematurely 
+* [GOBBLIN-1741] Create manifest based dataset finder 
+* [GOBBLIN-1739] Define Datanodes and Dataset Descriptor for Iceberg 
+* [GOBBLIN-1737] Fix bug when using mysql user quota manager 
+* [GOBBLIN-1738] Move dataset handler code before cleaning up staging data 
+* [GOBBLIN-1736] Add metrics for change stream monitor and mysql quota manager 
+* [GOBBLIN-1734] Make DestinationDatasetHandler work on streaming sources 
+* [GOBBLIN-1735] Correct a log line and GTE with correct number of total task 
count 
+* [GOBBLIN-1733] Support multiple node types in shared flowgraph, fix logs 
+* [GOBBLIN-1732] Search for dummy file in writer directory 
+* [GOBBLIN-1730] Include flow execution id when try to cancel/submit job using 
SimpleKafkaSpecProducer 
+* [GOBBLIN-1731] Enable HiveMetadataWriter to override table schema 
+* [GOBBLIN-1728] Fix YarnService incorrect container allocation behavior 
+* [GOBBLIN-1729] Use root cause for checking if exception is transient 
+* [GOBBLIN-1727] Use delete API to delete the helix job instead of stop it 
+* [GOBBLIN-1724] Support a shared flowgraph layout in GaaS 
+* [GOBBLIN-1725] Fix bugs in gaas warm standby mode 
+* [GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS 
+* [GOBBLIN-1721] Give option to cancel helix workflow through Delete API 
+* [GOBBLIN-1723] Ignore AlreadyExistsException in hive writer 
+* [GOBBLIN-1722] Add log line for committing/retrieving watermarks in 
streaming 
+* [GOBBLIN-1720] Add ancestors owner permissions preservations for iceberg 
distcp 
+* [GOBBLIN-1712] Fail GMIP container for known transient exceptions to avoid 
data loss 
+* [GOBBLIN-1707] Enhance IcebergDataset to detect when files already at dest 
then proceed with only delta 
+* [GOBBLIN-1719] Replace moveToTrash with moveToAppropriateTrash for hadoop 
trash 
+* [GOBBLIN-1718] Define DagActionStoreMonitor to listen for kill/resume
+* [GOBBLIN-1717] Correct semantics of IcebergDatasetTest and streamline both 
impl and test code 
+* [GOBBLIN-1716] Refactor HighLevelConsumer to make consumer initiatlization 
configurable
+* [GOBBLIN-1707] Update IcebergDataset to incorporate all snapshots, not only 
the current one 
+* [GOBBLIN-1714] Use FileNotFoundException when determining files in 
source/target instead of generic IOException 
+* [GOBBLIN-1713] Add missing sql source validation 
+* [GOBBLIN-1712] Fail container for known transient exceptions to avoid data 
loss 
+* [GOBBLIN-1707] Add IcebergTableTest unit test 
+* [GOBBLIN-1708] Improve TimeAwareRecursiveCopyableDataset to lookback only 
into datefolders that match range 
+* [GOBBLIN-1710] Make Codecov optional in CI and not fail 
+* [GOBBLIN-1704] Purge offline helix instances during startup 
+* [GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet 
to generate Copy Entities to support Distcp for Iceberg 
+* [GOBBLIN-1706] Add DagActionStore to store the action to kill/resume one 
flow execution 
+* [GOBBLIN-1705] New consumer service to monitor changes to FlowSpecStore 
+* [GOBBLIN-1702] Fix helix job wait completion bug when job goes to STOPPING 
state 
+* [GOBBLIN-1700] Remove unused coveralls-gradle-plugin dependency 
+* [GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin 
portal 
+* [GOBBLIN-1699] Log progress of reducer task for visibility with slow 
compaction jobs 
+* [GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment 
+* [GOBBLIN-1703] Avoid double quota increase for adhoc flows 
+* [GOBBLIN-1697] Have a separate resource handler to rely on CDC stream to do 
message forwarding 
+* [GOBBLIN-1696] Implement file based flowgraph that detects changes to the 
underlying files
+* [GOBBLIN-1694] Add GMCE topic explicitly to hive commit event 
+* [GOBBLIN-1691] Add MysqlUserQuotaManager 
+* [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode 
+* [GOBBLIN-1690] Improve logging in ORC Writer 
+* [GOBBLIN-1698] Fast fail during work unit generation based on config 
+* [GOBBLIN-1686] Allow all Iceberg exceptions to be fault tolerant 
+* [GOBBLIN-1684] Stub for FileSystem based message buffer 
+* [GOBBLIN-1673]* [GOBBLIN-1683] Skeleton code for handling messages between 
task runner / application master for Dynamic work unit allocation 
+* [GOBBLIN-1681] Guard against exists fs call as well 
+* [GOBBLIN-1678] Refactor git flowgraph component to be extensible 
+* [GOBBLIN-1677] Fix timezone property to read from key correctly 
+* [GOBBLIN-1675] Add pagination for GaaS on server side 
+* [GOBBLIN-1672] Refactor metrics from DagManager into its own class, add 
metrics 
+* [GOBBLIN-1671] Fix gobblin.sh script to add external jars as colon separated 
to HADOOP_CLASSPATH 
+* [GOBBLIN-1670] Remove rat tasks and unneeded checkstyles blocking build 
pipeline 
+* [GOBBLIN-1669] Clean up TimeAwareRecursiveCopyableDataset to support seconds 
in time
+* [GOBBLIN-1668] Add audit counts for iceberg registration 
+* [GOBBLIN-1667] Create new predicate - ExistingPartitionSkipPredicate 
+* [GOBBLIN-1667] Supporting for true ABORT on existing entity
+* [GOBBLIN-1664] Allow table to flush after write failure 
+* [GOBBLIN-1663] Add some debug log lines around GMIP hive commit events 
+* [GOBBLIN-1662] Fix running counts for retried flows 
+* [GOBBLIN-1657] Update completion watermark on change_property in 
IcebergMetadataWriter 
+* [GOBBLIN-1656] Return a http status 503 on GaaS when quota is exceeded for 
user or flowgroup 
+* [GOBBLIN-1654] Add capacity floor to avoid aggressively requesting resource 
and small files
+* [GOBBLIN-1653] Shorten job name length if it exceeds 255 characters 
+* [GOBBLIN-1652] Add more log in the KafkaJobStatusMonitor in case it fails to 
process one GobblinTrackingEvent 
+* [GOBBLIN-1651] Add config to set close timeout in HiveRegister 
+* [GOBBLIN-1650] Implement flowGroup quotas for the DagManager 
+* [GOBBLIN-1648] Complete use of JDBC DataSource 'read-only' validation query 
by incorporating where previously omitted 
+* [GOBBLIN-1647] Add hive commit GTE to HiveMetadataWriter  
+* [GOBBLIN-1644] Log assigned participant when helix participant check fails 
+* [GOBBLIN-1645] Change the prefix of dagManager heartbeat to make it 
consistent with other metrics 
+* [GOBBLIN-1641] Add meter for sla exceeded flows 
+* [GOBBLIN-1640] Add an API in AbstractBaseKafkaConsumerClient to list 
selected topics 
+* [GOBBLIN-1639] Prevent metrics reporting if configured, clean up workunit 
count metric 
+* [GOBBLIN-1638] Fix unbalanced running count metrics due to Azkaban failures 
+* [GOBBLIN-1637] Add writer, operation, and partition info to failed metadata 
writer events 
+* [GOBBLIN-1636] Close DatasetCleaner after clean task 
+* [GOBBLIN-1635] Avoid loading env configuration when using config store to 
improve the performance 
+* [GOBBLIN-1634] Add retries on flow sla kills 
+* [GOBBLIN-1633] Fix compaction actions on job failure not retried if 
compaction succeeds 
+* [GOBBLIN-1632] Use data node aliases to figure out data node names before 
using DMAS 
+* [GOBBLIN-1631] Emit heartbeat for dagManagerThread 
+* [GOBBLIN-1630] Remove flow level metrics for adhoc flows 
+* [GOBBLIN-1613] Add metadata writers field to GMCE schema 
+* [GOBBLIN-1629] Make GobblinMCEWriter be able to catch error when calculating 
hive specs 
+* [GOBBLIN-1620] Make yarn container allocation group by helix tag 
+* [GOBBLIN-1616] Add close connection logic in salseforceSource 
+* [GOBBLIN-1628] Add/fix some fields of MetadataWriterFailureEvent 
+* [GOBBLIN-1627] Provide option to convert datanodes names 
+* [GOBBLIN-1626] Use user supplied props to create FileSystem in 
DatasetCleanerTask 
+* [GOBBLIN-1625] Add coverage for edge cases when table paths do not exist, 
check parents 
+* [GOBBLIN-1624] Refactor quota management, fix various bugs in accounting of 
running jobs
+* [GOBBLIN-1623] Fix NPE when try to close RestApiConnector 
+* [GOBBLIN-1622] Clear bad mysql packages from cache in CI/CD machines 
+* [GOBBLIN-1621] Make HelixRetriggeringJobCallable emit job skip event when 
job is dropped due to previous job is running 
+* [GOBBLIN-1619] WriterUtils.mkdirsWithRecursivePermission contains race 
condition and puts unnecessary load on filesystem 
+* [GOBBLIN-1617] Pass configurations to some HadoopUtils APIs 
+* [GOBBLIN-1616] Make RestApiConnector be able to close the connection finally 
+* [GOBBLIN-1615] Add config to set log level for any class 
+* [GOBBLIN-1614] Fix bug where partitioned tables would always return the 
wrong equali… 
+* [GOBBLIN-1612] Add description about downloading gradle wrapper 
+* [GOBBLIN-1611] Fix a wrong value for writer.codec.type in the document 
+* [GOBBLIN-1609] Don't flush on change_property operation 
+* [GOBBLIN-1608] Fix case where error GTE is incorrectly sent from MCE writer 
+* [GOBBLIN-1606] Change DEFAULT_GOBBLIN_COPY_CHECK_FILESIZE value 
+* [GOBBLIN-1605] Fix mysql ubuntu download 404 not found for Github Actions 
CI/CD 
+* [GOBBLIN-1604] Throw exception if there are no allocated requests due to 
lack of resources
+* [GOBBLIN-1603] Throws error if configured when encountering an IO exception  
+* [GOBBLIN-1601] Implement ChangePermissionCommitStep 
+* [GOBBLIN-1598] Fix metrics already exist issue in dag manager 
+* [GOBBLIN-1597] Add error handling in dagmanager to continue if dag fails to 
process
+* [GOBBLIN-1596] Ignore already exists exception if the table has already been 
created
+* [GOBBLIN-1594] Add guard in DagManager for improperly formed SLA 
+* [GOBBLIN-1593] Fix bugs in dag manager about metric reporting and job status 
monitor 
+* [GOBBLIN-1592] Make hive copy be able to apply filter on directory 
+* [GOBBLIN-1591] Lazily initialize FileContext and do not hold a reference to 
it 
+* [GOBBLIN-1590] Add low/high watermark information in event emitted by 
Gobblin cluster 
+* [GOBBLIN-1589] Add FileContextFactory to cache FileContext instances 
+* [GOBBLIN-1588] Send failure events for write failures when watermark is 
advanced in MCE writer 
+* [GOBBLIN-1587] Bump version of code cov plugin 
+* [GOBBLIN-1585] Fix for GaaS (DagManager) keep retrying a failed job beyond 
max attempt number 
+* [GOBBLIN-1584] Add replace record logic for Mysql writer 
+* [GOBBLIN-1583] Add System level job start SLA 
+* [GOBBLIN-1582] Fill low/high watermark info in SourceState for 
QueryBasedSource 
+* [GOBBLIN-1581] Iterate over Sql ResultSet in Only the Forward Direction 
+* [GOBBLIN-1580] Check table exists instead of call create table directly to 
make sure table exists 
+* [GOBBLIN-1578] Avoid deletion of data while dropping a hive table  
+* [GOBBLIN-1577] Change the multiplier used in ExponentialWaitStrategy 
+* [GOBBLIN-1576] Skip appending record count to staging file 
+* [GOBBLIN-1575] Use reference count in helix manager, so that 
connect/disconnect are called once and at the right time 
+* [GOBBLIN-1574] Added whitelist for iceberg tables to add new partition
+* [GOBBLIN-1573] Fix the ClassNotFoundException in streaming test pipeline 
+* [GOBBLIN-1565] Make GMCEWriter fault tolerant so that one topic failure will 
not affect other topics in the same container 
+* [GOBBLIN-1564] Codestyle changes, typo corrections, improved javadoc 
+* [GOBBLIN-1552] Determine flow status correctly when dag manager is disabled 
+* [GOBBLIN-1492] Optimize flowspec keys on configToProperties 
+
 GOBBLIN 0.16.0
 --------------
 

Reply via email to