This is an automated email from the ASF dual-hosted git repository.
rexxiong pushed a change to branch serverless-spark/release-0.5.4-1.1
in repository https://gitbox.apache.org/repos/asf/celeborn.git
at b35fb5072 [CELEBORN-1792] MemoryManager resume should use
pinnedDirectMemory instead of usedDirectMemory
This branch includes the following new commits:
new 21264a525 [FLINK] remove uid and gid from dockerfile.inner
Link: https://code.alibaba-inc.com/soe/celeborn/codereview/12260282
new e81ae5e21 [CELEBORN-INNER] Change to inner 1.17 version
new eeed6268e [CELEBORN-INNER]Filter network excetpion log by ip prefix
new 681cd2877 [CELEBORN-INNER] Always reponse
HeartbeatFromApplicationResponse
new f80532aa0 [CELEBORN-INNER] Log Exception when master initialize failed
Link: https://code.alibaba-inc.com/soe/celeborn/codereview/14357663
new 97c2368fa [CELEBORN-INNER] Support Authentication
new 07688a3e8 [CELEBORN-INNER] Support QuotaManager Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/14114182
new a78f564e8 [CELEBORN-INNER] support io encryption Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/14268840
new 6d23f91a1 [CELEBORN-INNER] Retry bind host to wait DNS reconcile Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/14462124
new 67164a1f7 [CELEBORN-INNER] Support Worker level throttle for Single
Tenant App
new e104c5532 [CELEBORN-INNER] Support dynamic APPFlowController at worker
side Link: https://code.alibaba-inc.com/soe/celeborn/codereview/14510910 *
[CELEBORN-INNER] Support dynamic APPFlowController at worker side
new 271197a0a [CELEBORN-INNER] Support encrypted password for database
connection. Link: https://code.alibaba-inc.com/soe/celeborn/codereview/14415218
new 3e506b6d1 [CELEBORN-INNER] Fix conf & improve log Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/14584987
new 985ab95fb [CELEBORN-INNER] jobManager fine-grained failover
new e7f2b86e5 [CELEBORN-INNER] Fix compatible issues with client of 0.3.0.
new 6b3e77d66 [CELEBORN-INNER] Fix compatible issues between worker and
master from 0.3.0-0.4.0.
new 58dc5605d [CELEBORN-INNER] Add log when status system apply
transaction log encounters exception.
new 8a9b9de06 [CELEBORN-INNER] Change registerStream loglevel from debug
to info.
new 558c69428 [CELEBORN-INNER] Resource.proto change
highWorkload/storageType to optional for upgrade master from 0.3.0 to 0.4.0
new 4ce8fd5ad [CELEBORN-INNER] Add haclient.MasterNotLeaderException for
compatiable between 0.4 worker to 0.3 master and 0.3 client to 0.4 master
new d1deed4f5 [CELEBORN-INNER] Improve legacy message decode, change
loglevel to DEBUG.
new 3955bbc92 [CELEBORN-INNER] Add custom cluster id to worker host
new cb8a9a97e Change register Stream log level from info to debug.
new cb142041b [CELEBORN-INNER] modify storagetype with WorkerInfo
new d715fc2d1 [CELEBORN-INNER] remove customClusterId from idns if
hostnameWithClusterId is false Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/16272420
new e302eed36 [CELEBORN-INNER] Remove add metrics when checkQuota.
new 9f10ccfb0 [CELEBORN-INNER] Support rass. Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/16598300
new c638c95b2 [CELEBORN-INNER] Fix jmfailover by replacing
filesystem.getStatus api
new 8c3d08f4f [CELEBORN-INNER] Fix jmfailover, worker status Serialize.
new dbabc3d6c [CELEBORN-INNER][fix #52949890]Support expire app data when
app quota exceed Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/14848880 *
[CELEBORN-INNER][fix #52949890]Support expire app data when app quota exceed
new f2fd79893 [CELEBORN-INNER] Support Celeborn auto scale. Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/16587393
new 482857a41 [CELEBORN-INNER] Support auto scale. Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/16988043 *
[CELEBORN-INNER][FOLLOWUP] Support auto scale
new 83d198fe9 [CELEBORN-INNER] remove system.println sts path
new a6199be77 [CELEBORN-INNER] add network metrics. Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/17431342 *
[CELEBORN-INNER] support network metrics
new a34f2fc7f [CELEBORN-INNER] build docker base image Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/17472757
new c0845ad5e [CELEBORN-INNER] fix compatiable problem for
MasterNotLeaderException
new 1cdb9d898 [CELEBORN-INNER] support scale up/scale down when workers
not all ready. Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/17554893
new 30db7997a [CELEBORN-INNER] fix unregister shuffle
new e28bf84db [CELEBORN-INNER] Align configuration name
new 07ee9d782 [CELEBORN-INNER] Fix merge problem/db/idns/proto
new ffb8d030d [CELEBORN-INNER] Support verify worker use product tenant
new dda8e82fe [CELEBORN-INNER] Fix database password encrypt problem
new 20174ab7e [CELEBORN-INNER] fix resourceComsuption compute
new 4d76db30f [CELEBORN-INNER] bind to 0.0.0.0 for http server
new 2c71bf565 [CELEBORN-INNER] bind to 0.0.0.0 for worker http server
new 6acd3332e [CELEBORN-INNER] improve quota manager
new cc74c0203 [CELEBORN-INNER] Fix Compatiable problem for internal port
new 2449b09ba Support migrate recovery path when work start
new 5f06787e0 [CELEBORN-INNER] Only reformat code
new 87d93334e [CELEBORN-INNER] Improve resource consumption and fix
compute application resource consumption. Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/17924384
new 017e6f2f5 Migrate worker meta when old dir exists
new 2bee12a15 Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/17993511
new 45cd6f03a Add metrics for master meta usable space and worker recover
usable space
new ddd4c285c [CELEBORN-INNER][Fix 59287953] Support early quota warning
Link: https://code.alibaba-inc.com/soe/celeborn/codereview/18106758
new e3e9d1e96 [CELEBORN-INNER] add appid in checkquota Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/17969146
new 12d156258 [CELEBORN-INNER] Fix flink compile problem.
new 03fdf9a1e [CELEBORN-1586] Add available workers Metrics
new 24cb83222 [CEELBORN-INNER] if workerHostIDNSEnabled, use getHostName
as hostname instead of fqdn hostname getted by getCanonicalHostName Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/14356758
new 5214885b4 [CELEBORN-INNER] update doc
new dd8cdfa60 Amend lost shuffles
new 022e6ccb0 [CELEBORN-INNER] Fix revise/delete app, add
revise/delete/refresh conf restapi
new 6ce08a32d Add http endpoint for liveness probe
new 45547a073 rpc port bind to 0.0.0.0
new 0f065e900 [CELEBORN-INNER][fix #60023899] Check SPARK_LOCAL_IP, if
exists use as Celeborn Local Ip Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/18572679 *
[CELEBORN-INNER] Check SPARK_LOCAL_IP, if exists use as Celeborn Local Ip
new 3ece1f408 Allow all compression algorithm for flink 1.17
本次代码评审主要加入了对缓冲区压缩的支持,通过新增`getBufferCompressor`抽象方法及使用`CompressionCodec`,同时针对不同Flink版本实现了特定的压缩处理逻辑,确保了在Flink
1.14和1.15版本中仅支持LZ4压缩算法,并在其他版本中提供了更灵活的压缩选择,同时调整了依赖该压缩逻辑的组件。 Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/18671303
new d97e45835 [CELEBORN-INNER] fix ut
new 1202ea5ce [CELEBORN-INNER] Support build client for spark4
本次代码评审主要涉及对Apache
Celeborn项目中多个模块的微小优化和适应性调整,包括更改Scala代码中可选值匹配逻辑,修正版权和许可文件,以及针对Spark
4.0版本的模块配置,同时优化了依赖管理和Netty相关库的阴影处理,增强跨版本兼容性。 Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/18742342 *
[CELEBORN-INNER] Support build client for spark4
new 9510ac0c2 [CELEBORN-INNER] fix flink use retry client
new aa789bdad [CELEBORN-INNER] fix compression
new 624db7d85 [CELEBORN-INNER] Add amend shuffle configuration
new 6dcc83014 [CELEBORN-INNER] Support decompress parameter for shuffle
reader Link: https://code.alibaba-inc.com/soe/celeborn/codereview/19036201
new 3660bc4e6 build arm image in jenkins
new dfa3821f3 [CELEBORN-1760] OOM causes disk buffer unable to be released
new 861b6a445 [CELEBORN-1765] Fix NPE when removeFileInfo in StorageManager
new 659c5bcbb [CELEBORN-1769] Fix packed partition location cause
GetReducerFileGroupResponse lose location
new 6f3eddeb2 fix 61414176 add dispatcher threads
new 4e0d3578a [fix #61569447] Tenant quota check should according to
MetaSystem Link: https://code.alibaba-inc.com/soe/celeborn/codereview/19529343
* [fix #61569447] Tenant quotacheck should according to MetaSystem
new 005de7034 [CELEBORN-INNER] fix sbt for inner
new f9b5f4507 [CELEBORN-INNER] update configuration doc
new 1a16df327 [CELEBORN-INNER] fix reader &ut
new f215d24db [CELEBORN-INNER] fix pb resourceConsumption ut
new f75ed161f [CELEBORN-INNER] fix port conflict ut and add new
configuration for constant port
new 75463d7d7 [CELEBORN-INNER] fix master quota/scale ut
new 4e2086d0b [CELEBORN-INNER] pass worker ut
new e8a1e2e68 [CELEBORN-INNER] format code
new 2645eb0c7 [CELEBORN-INNER] fix appFlowController get tenantId
new c2de596d7 [CELEBORN-INNER] Support sbt compile inner flink1.17
new c3cdff580 [CELEBORN-INNER] fix worker index
new 408bdf9f6 [CELEBORN-INNER] fix worker index
new 9652082a6 fix#61780839 Celeborn proxy adapt with Celeborn 0.5.0
new 07058ad04 to #61780839 update master client for celeborn proxy
new abc1365cc [CELEBORN-1770] FlushNotifier should setException for all
Throwables in Flusher
new 39c09579a Revert "[CELEBORN-1376] Push data failed should always
release request body"
new 9c09eb4bd [CELEBORN-1510] Partial task unable to switch to the replica
new d549cf1dc [CELEBORN-1721][CIP-12] Support HARD_SPLIT in PushMergedData
new 5c7abe748 [CELEBORN-INNER] fix APPFlowController when tenantId is null
new b010e2944 [CELEBORN-INNER] fix SortBasedPusherSuiteJ.
new 00e57560e [CELEBORN-INNER] fix SortBasedShuffleWriterSuiteJ.
new cbe1c9e22 [CELEBORN-INNER] change test log level to info.
new 5cbfda3ae [CELEBORN-INNER] disable replicate in ut
new 6bb792c0c [CELEBORN-INNER] change encryption enabled to false
new a4201a661 [CELEBORN-INNER] disable inner authentication
new 561abc161 [CELEBORN-INNER] fix ut
new 56206952a [CELEBORN-1743] Resolve the metrics data interruption and
the job failure caused by locked resources
new 9e5c07531 [CELEBORN-1500] Filter out empty InputStreams
new 6f5bce946 [CELEBORN-1782] Worker in congestion control should be in
blacklist to avoid impact new shuffle
new 4c76d1de6 [CELEBORN-1783] Fix Pending task in commitThreadPool wont be
canceled
new 778fd027c [CELEBORN-1763] Fix DataPusher be blocked for a long time
new dc4bdbeca to #61830366 extend proxy apis
new 99cc9d219 to #62097184 adapt EMR-ZONE environment
new 628bc3ff6 [CELEBORN-1319] Optimize skew partition logic for Reduce
Mode to avoid sorting shuffle files
new 9aea98a98 [CELEBORN-INNER] fix ut compile
new fef98f22e [CELEBORN-INNER] important keep compatiable with inner proto
new 86ec09a92 [CELEBORN-INNER] support skew shuffle rerun stage
new 2c6018175 to #60512610 Refine selection logic
new 3bae14dc1 fix #62531036 Fix proxy selection uts
new 3e2bde086 to #61780839 update docker file
new ad5d23bc7 [CELEBORN-1818] Fix incorrect timeout exception when waiting
on no pending writes
new 40e29f4fe to #62556294 update exist region and zone for cluster info
items
new a32df8053 to #63045080 change proxy enable config item
new aed2dd1ee [CELEBORN-1721][FOLLOWUP] Fix the problem of getting
partition location in ShuffleClientImpl during soft split
new 492577a2c [fix# 63302560] fix quota format
new ffad317a0 [fix#63334141] skip soft split when push merge data Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/20380569
new 94d1e0b1d [CELEBORN-1701][FOLLOWUP] Support stage rerun for shuffle
data lost
new bdc9ef7ac [CELEBORN-1838] Interrupt spark task should not report fetch
failure
new ba4893f3b [CELEBORN-1850] Setup worker endpoint after initalizing
controller
new ceebc4a30 #63675918 memory manager supports dynamic configs
new ef382f88a [CELEBORN-1831] Add ratis commitIndex metrics
new 748c1394e to 63295277 handle open stream synchronous to asynchronous
new 356670467 to 63295277 fix CI error caused by fetchHandler update
new 6f9ef025d fix#63759651 disable expire app when quota excced
new cf3fd377d to #63521717 1. memory available check serving state 2. add
new config to pause memory storage from using by memory pressure
new 2bcad3fed [fix #64528553] shouldn't tracking hardsplit batch Link:
https://code.alibaba-inc.com/soe/celeborn/codereview/20893917
new c7ee2e7cb [CELEBORN-1865] Update master endpointRef when master leader
is abnormal
new 7db7f4ad2 [CELEBORN-1757] Add retry when sending RPC to
LifecycleManager
new 4bc41addd [CELEBORN-1883] Replace HashSet with
ConcurrentHashMap.newKeySet for ShuffleFileGroups
new 8f443e03a [CELEBORN-1885] Fix nullptr exceptions in FetchChunk after
worker restart
new ae7bfa17a [CELEBORN-1879] Ignore invalid chunk range generated by
splitSkewedPartitionLocations
new 2e1cddab1 Bump 0.5.4
new dfee7f00f fix compile
new b2d5ccc27 fix #64570079 scale down should align with cluster expect
scale down number
new 79d51bcd5 to 63521722 update helm
new e57c9b6bb fix #0 format code
new 9707c3ea3 fix #64873863 disable exceed congestion control high
watermark cause worker high workload
new 4b59ef45f fix #64878410 unify-useridentifier
new b35fb5072 [CELEBORN-1792] MemoryManager resume should use
pinnedDirectMemory instead of usedDirectMemory
The 146 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.