ReemaAlzaid commented on PR #11615:
URL:
https://github.com/apache/incubator-gluten/pull/11615#issuecomment-3904356968
Here are the relevant test cases I ran
### Before
```
Last login: Sun Feb 15 15:10:56 on ttys088
(3.13.3) ➜ incubator-gluten git:(main) ✗ export
GLUTEN_JAR=/Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar
export ICEBERG_JAR=/tmp/iceberg.jar
spark-submit \
--jars "$GLUTEN_JAR,$ICEBERG_JAR" \
--conf spark.plugins=org.apache.gluten.GlutenPlugin \
--conf spark.gluten.sql.columnar.backend.lib=velox \
--conf spark.gluten.enabled=true \
--conf spark.driver.extraClassPath="$GLUTEN_JAR:$ICEBERG_JAR" \
--conf spark.executor.extraClassPath="$GLUTEN_JAR:$ICEBERG_JAR" \
test_iceberg_simple.py
26/02/15 15:15:56 WARN Utils: Your hostname, Reemas-MacBook-Pro.local
resolves to a loopback address: 127.0.0.1; using 192.168.100.32 instead (on
interface en0)
26/02/15 15:15:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
26/02/15 15:15:56 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
26/02/15 15:15:57 INFO SparkContext: Running Spark version 3.5.5
26/02/15 15:15:57 INFO SparkContext: OS info Mac OS X, 15.6, aarch64
26/02/15 15:15:57 INFO SparkContext: Java version 17.0.18
26/02/15 15:15:57 INFO ResourceUtils:
==============================================================
26/02/15 15:15:57 INFO ResourceUtils: No custom resources configured for
spark.driver.
26/02/15 15:15:57 INFO ResourceUtils:
==============================================================
26/02/15 15:15:57 INFO SparkContext: Submitted application:
iceberg-input-file-metadata-test
26/02/15 15:15:57 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: ,
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name:
offHeap, amount: 2048, script: , vendor: ), task resources: Map(cpus -> name:
cpus, amount: 1.0)
26/02/15 15:15:57 INFO ResourceProfile: Limiting resource is cpu
26/02/15 15:15:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
26/02/15 15:15:57 INFO SecurityManager: Changing view acls to: reema
26/02/15 15:15:57 INFO SecurityManager: Changing modify acls to: reema
26/02/15 15:15:57 INFO SecurityManager: Changing view acls groups to:
26/02/15 15:15:57 INFO SecurityManager: Changing modify acls groups to:
26/02/15 15:15:57 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: reema; groups with
view permissions: EMPTY; users with modify permissions: reema; groups with
modify permissions: EMPTY
26/02/15 15:15:57 INFO Utils: Successfully started service 'sparkDriver' on
port 49661.
26/02/15 15:15:57 INFO SparkEnv: Registering MapOutputTracker
26/02/15 15:15:57 INFO SparkEnv: Registering BlockManagerMaster
26/02/15 15:15:57 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
26/02/15 15:15:57 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
26/02/15 15:15:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
26/02/15 15:15:57 INFO DiskBlockManager: Created local directory at
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/blockmgr-e6925b7a-9b9b-43a1-8861-1452ad6dda87
26/02/15 15:15:57 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:15:57 INFO SparkEnv: Registering OutputCommitCoordinator
26/02/15 15:15:57 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
26/02/15 15:15:57 INFO Utils: Successfully started service 'SparkUI' on port
4040.
26/02/15 15:15:57 INFO SparkContext: Added JAR
file:///Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar
at
spark://192.168.100.32:49661/jars/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar
with timestamp 1771157757097
26/02/15 15:15:57 INFO SparkContext: Added JAR
file:///private/tmp/iceberg.jar at
spark://192.168.100.32:49661/jars/iceberg.jar with timestamp 1771157757097
26/02/15 15:15:57 INFO Discovery: Start discovering components in the
current classpath...
26/02/15 15:15:57 INFO Discovery: Discovered component files:
org.apache.gluten.backendsapi.velox.VeloxBackend,
org.apache.gluten.component.VeloxIcebergComponent. Duration: 8 ms.
26/02/15 15:15:57 INFO package: Components registered within order: velox,
velox-iceberg
26/02/15 15:15:57 INFO GlutenDriverPlugin: Gluten components:
==============================================================
Component velox
velox_branch = HEAD
velox_revision = f247a8e922c4802fd9b9cf7a626421bff9b803fd
velox_revisionTime = 2026-02-07 14:11:45 +0000
Component velox-iceberg
==============================================================
26/02/15 15:15:57 INFO SubstraitBackend: Gluten build info:
==============================================================
Gluten Version: 1.7.0-SNAPSHOT
GCC Version:
Java Version: 17
Scala Version: 2.12.15
Spark Version: 3.5.5
Hadoop Version: 2.7.4
Gluten Branch: main
Gluten Revision: be3eeea8c33ddfb5352a37ad7d169e326c4dc1ba
Gluten Revision Time: 2026-02-13 22:47:03 +0000
Gluten Build Time: 2026-02-15T12:07:38Z
Gluten Repo URL: https://github.com/ReemaAlzaid/incubator-gluten.git
==============================================================
26/02/15 15:15:57 INFO VeloxListenerApi: Memory overhead is not set. Setting
it to 644245094 automatically. Gluten doesn't follow Spark's calculation on
default value of this option because the actual required memory overhead will
depend on off-heap usage than on on-heap usage.
26/02/15 15:15:57 INFO SparkDirectoryUtil: Created local directory at
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b
26/02/15 15:15:57 INFO JniWorkspace: Creating JNI workspace in root
directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7
26/02/15 15:15:57 INFO JniWorkspace: JNI workspace
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281
created in root directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7
26/02/15 15:15:57 INFO JniLibLoader: Read real path
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libgluten.dylib
for libPath
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libgluten.dylib
26/02/15 15:15:57 INFO JniLibLoader: Library
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libgluten.dylib
has been loaded using path-loading method
26/02/15 15:15:57 INFO JniLibLoader: Library
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libgluten.dylib
has been loaded
26/02/15 15:15:57 INFO JniLibLoader: Successfully loaded library
darwin/aarch64/libgluten.dylib
26/02/15 15:15:57 INFO JniLibLoader: Read real path
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libvelox.dylib
for libPath
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libvelox.dylib
26/02/15 15:15:57 INFO JniLibLoader: Library
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libvelox.dylib
has been loaded using path-loading method
26/02/15 15:15:57 INFO JniLibLoader: Library
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libvelox.dylib
has been loaded
26/02/15 15:15:57 INFO JniLibLoader: Successfully loaded library
darwin/aarch64/libvelox.dylib
W20260215 15:15:57.885989 14490670 MemoryArbitrator.cpp:84] Query memory
capacity[460.50MB] is set for NOOP arbitrator which has no capacity enforcement
26/02/15 15:15:57 INFO DriverPluginContainer: Initialized driver component
for plugin org.apache.gluten.GlutenPlugin.
26/02/15 15:15:57 INFO Executor: Starting executor ID driver on host
192.168.100.32
26/02/15 15:15:57 INFO Executor: OS info Mac OS X, 15.6, aarch64
26/02/15 15:15:57 INFO Executor: Java version 17.0.18
26/02/15 15:15:57 INFO Executor: Starting executor with user classpath
(userClassPathFirst = false):
'file:/Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar,file:/tmp/iceberg.jar,file:/Users/reema/Desktop/OpenSource/incubator-gluten/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar,file:/Users/reema/Desktop/OpenSource/incubator-gluten/iceberg.jar'
26/02/15 15:15:57 INFO Executor: Created or updated repl class loader
org.apache.spark.util.MutableURLClassLoader@3d5e1c01 for default.
26/02/15 15:15:57 INFO CodedInputStreamClassInitializer: The
defaultRecursionLimit in protobuf has been increased to 100000
26/02/15 15:15:57 INFO VeloxListenerApi: Gluten is running with Spark local
mode. Skip running static initializer for executor.
26/02/15 15:15:57 INFO ExecutorPluginContainer: Initialized executor
component for plugin org.apache.gluten.GlutenPlugin.
26/02/15 15:15:57 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 49662.
26/02/15 15:15:57 INFO NettyBlockTransferService: Server created on
192.168.100.32:49662
26/02/15 15:15:57 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
26/02/15 15:15:57 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, 192.168.100.32, 49662, None)
26/02/15 15:15:57 INFO BlockManagerMasterEndpoint: Registering block manager
192.168.100.32:49662 with 2.4 GiB RAM, BlockManagerId(driver, 192.168.100.32,
49662, None)
26/02/15 15:15:57 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, 192.168.100.32, 49662, None)
26/02/15 15:15:57 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, 192.168.100.32, 49662, None)
26/02/15 15:15:58 INFO VeloxBackend: Gluten SQL Tab has been attached.
26/02/15 15:15:58 INFO SparkShimLoader: Loading Spark Shims for version:
3.5.5
26/02/15 15:15:58 INFO SparkShimLoader: Using Shim provider:
List(org.apache.gluten.sql.shims.spark35.SparkShimProvider@4339652b)
================================================================================
Creating Iceberg table...
================================================================================
26/02/15 15:15:58 INFO SharedState: Setting hive.metastore.warehouse.dir
('null') to the value of spark.sql.warehouse.dir.
26/02/15 15:15:58 INFO SharedState: Warehouse path is
'file:/Users/reema/Desktop/OpenSource/incubator-gluten/spark-warehouse'.
26/02/15 15:15:58 INFO CatalogUtil: Loading custom FileIO implementation:
org.apache.iceberg.hadoop.HadoopFileIO
26/02/15 15:15:59 INFO BaseMetastoreCatalog: Table properties set at catalog
level through catalog properties: {}
26/02/15 15:15:59 INFO BaseMetastoreCatalog: Table properties enforced at
catalog level through catalog properties: {}
26/02/15 15:15:59 INFO HadoopTableOperations: Committed a new metadata file
file:/tmp/iceberg_warehouse/default/test_table/metadata/v1.metadata.json
26/02/15 15:15:59 WARN GlutenFallbackReporter: Validation failed for plan:
AppendData[QueryId=1], due to: [FallbackByBackendSettings] Validation failed on
node AppendData
26/02/15 15:15:59 INFO CodeGenerator: Code generated in 120.717458 ms
26/02/15 15:15:59 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:15:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes
in memory (estimated size 29.7 KiB, free 2.4 GiB)
26/02/15 15:15:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on 192.168.100.32:49662 (size: 29.7 KiB, free: 2.4 GiB)
26/02/15 15:15:59 INFO SparkContext: Created broadcast 0 from broadcast at
SparkWrite.java:195
26/02/15 15:15:59 INFO AppendDataExec: Start processing data source write
support: IcebergBatchWrite(table=local.default.test_table, format=PARQUET). The
input RDD has 3 partitions.
26/02/15 15:15:59 INFO SparkContext: Starting job: sql at
NativeMethodAccessorImpl.java:0
26/02/15 15:15:59 INFO DAGScheduler: Got job 0 (sql at
NativeMethodAccessorImpl.java:0) with 3 output partitions
26/02/15 15:15:59 INFO DAGScheduler: Final stage: ResultStage 0 (sql at
NativeMethodAccessorImpl.java:0)
26/02/15 15:15:59 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:15:59 INFO DAGScheduler: Missing parents: List()
26/02/15 15:15:59 INFO DAGScheduler: Submitting ResultStage 0
(MapPartitionsRDD[1] at sql at NativeMethodAccessorImpl.java:0), which has no
missing parents
26/02/15 15:15:59 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 7.8 KiB, free 2.4 GiB)
26/02/15 15:15:59 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes
in memory (estimated size 4.4 KiB, free 2.4 GiB)
26/02/15 15:15:59 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on 192.168.100.32:49662 (size: 4.4 KiB, free: 2.4 GiB)
26/02/15 15:15:59 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:15:59 INFO DAGScheduler: Submitting 3 missing tasks from
ResultStage 0 (MapPartitionsRDD[1] at sql at NativeMethodAccessorImpl.java:0)
(first 15 tasks are for partitions Vector(0, 1, 2))
26/02/15 15:15:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
resource profile 0
26/02/15 15:15:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 9503 bytes)
26/02/15 15:15:59 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
1) (192.168.100.32, executor driver, partition 1, PROCESS_LOCAL, 9503 bytes)
26/02/15 15:15:59 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID
2) (192.168.100.32, executor driver, partition 2, PROCESS_LOCAL, 9503 bytes)
26/02/15 15:15:59 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
26/02/15 15:15:59 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
26/02/15 15:15:59 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
26/02/15 15:16:00 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:00 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:00 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:00 INFO DataWritingSparkTask: Writer for partition 0 is
committing.
26/02/15 15:16:00 INFO DataWritingSparkTask: Writer for partition 2 is
committing.
26/02/15 15:16:00 INFO DataWritingSparkTask: Writer for partition 1 is
committing.
26/02/15 15:16:00 INFO DataWritingSparkTask: Committed partition 1 (task 1,
attempt 0, stage 0.0)
26/02/15 15:16:00 INFO DataWritingSparkTask: Committed partition 0 (task 0,
attempt 0, stage 0.0)
26/02/15 15:16:00 INFO DataWritingSparkTask: Committed partition 2 (task 2,
attempt 0, stage 0.0)
26/02/15 15:16:00 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2).
4118 bytes result sent to driver
26/02/15 15:16:00 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
4114 bytes result sent to driver
26/02/15 15:16:00 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1).
4110 bytes result sent to driver
26/02/15 15:16:00 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID
2) in 405 ms on 192.168.100.32 (executor driver) (1/3)
26/02/15 15:16:00 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID
0) in 429 ms on 192.168.100.32 (executor driver) (2/3)
26/02/15 15:16:00 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID
1) in 406 ms on 192.168.100.32 (executor driver) (3/3)
26/02/15 15:16:00 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
26/02/15 15:16:00 INFO DAGScheduler: ResultStage 0 (sql at
NativeMethodAccessorImpl.java:0) finished in 0.476 s
26/02/15 15:16:00 INFO DAGScheduler: Job 0 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:00 INFO TaskSchedulerImpl: Killing all running tasks in stage
0: Stage finished
26/02/15 15:16:00 INFO DAGScheduler: Job 0 finished: sql at
NativeMethodAccessorImpl.java:0, took 0.513478 s
26/02/15 15:16:00 INFO AppendDataExec: Data source write support
IcebergBatchWrite(table=local.default.test_table, format=PARQUET) is committing.
26/02/15 15:16:00 INFO SparkWrite: Committing append with 3 new data files
to table local.default.test_table
26/02/15 15:16:00 INFO HadoopTableOperations: Committed a new metadata file
file:/tmp/iceberg_warehouse/default/test_table/metadata/v2.metadata.json
26/02/15 15:16:00 INFO SnapshotProducer: Committed snapshot
8759041077900200141 (MergeAppend)
26/02/15 15:16:00 INFO LoggingMetricsReporter: Received metrics report:
CommitReport{tableName=local.default.test_table,
snapshotId=8759041077900200141, sequenceNumber=1, operation=append,
commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS,
totalDuration=PT0.302477167S, count=1}, attempts=CounterResult{unit=COUNT,
value=1}, addedDataFiles=CounterResult{unit=COUNT, value=3},
removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=3},
addedDeleteFiles=null, addedEqualityDeleteFiles=null,
addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null,
removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null,
removedDVs=null, totalDeleteFiles=CounterResult{unit=COUNT, value=0},
addedRecords=CounterResult{unit=COUNT, value=3}, removedRecords=null,
totalRecords=CounterResult{unit=COUNT, value=3},
addedFilesSizeInBytes=CounterResult{unit=BYTES, value=1920},
removedFilesSizeInBytes=null, totalFilesSizeInBytes=CounterResult{uni
t=BYTES, value=1920}, addedPositionalDeletes=null,
removedPositionalDeletes=null, totalPositionalDeletes=CounterResult{unit=COUNT,
value=0}, addedEqualityDeletes=null, removedEqualityDeletes=null,
totalEqualityDeletes=CounterResult{unit=COUNT, value=0}, manifestsCreated=null,
manifestsReplaced=null, manifestsKept=null, manifestEntriesProcessed=null},
metadata={engine-version=3.5.5, app-id=local-1771157757901, engine-name=spark,
iceberg-version=Apache Iceberg 1.10.0 (commit
2114bf631e49af532d66e2ce148ee49dd1dd1f1f)}}
26/02/15 15:16:00 INFO SparkWrite: Committed in 322 ms
26/02/15 15:16:00 INFO AppendDataExec: Data source write support
IcebergBatchWrite(table=local.default.test_table, format=PARQUET) committed.
================================================================================
Testing input_file_name() on Iceberg table
================================================================================
=== input_file_name() Results ===
26/02/15 15:16:00 INFO V2ScanRelationPushDown:
Output: id#7, name#8
26/02/15 15:16:00 INFO SnapshotScan: Scanning table local.default.test_table
snapshot 8759041077900200141 created at 2026-02-15T12:16:00.623+00:00 with
filter true
26/02/15 15:16:00 INFO BaseDistributedDataScan: Planning file tasks locally
for table local.default.test_table
26/02/15 15:16:00 INFO SparkPartitioningAwareScan: Reporting
UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:00 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:00 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:00 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:00 INFO SparkContext: Created broadcast 2 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:00 INFO MemoryStore: Block broadcast_3 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:00 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:00 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:00 INFO SparkContext: Created broadcast 3 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:00 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 WARN GlutenFallbackReporter: Validation failed for plan:
Project[QueryId=2], due to: fallback input file expression
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 8.318542 ms
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_4 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 4 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:01 INFO SparkContext: Starting job: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:01 INFO DAGScheduler: Got job 1 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
with 1 output partitions
26/02/15 15:16:01 INFO DAGScheduler: Final stage: ResultStage 1 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
26/02/15 15:16:01 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:01 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:01 INFO DAGScheduler: Submitting ResultStage 1
(MapPartitionsRDD[5] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56),
which has no missing parents
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_5 stored as values in
memory (estimated size 16.7 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes
in memory (estimated size 7.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory
on 192.168.100.32:49662 (size: 7.0 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 5 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:01 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 1 (MapPartitionsRDD[5] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
(first 15 tasks are for partitions Vector(0))
26/02/15 15:16:01 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
resource profile 0
26/02/15 15:16:01 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
3) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11471 bytes)
26/02/15 15:16:01 INFO Executor: Running task 0.0 in stage 1.0 (TID 3)
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.898333 ms
26/02/15 15:16:01 INFO Executor: Finished task 0.0 in stage 1.0 (TID 3).
7086 bytes result sent to driver
26/02/15 15:16:01 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID
3) in 59 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:01 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool
26/02/15 15:16:01 INFO DAGScheduler: ResultStage 1 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
finished in 0.064 s
26/02/15 15:16:01 INFO DAGScheduler: Job 1 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:01 INFO TaskSchedulerImpl: Killing all running tasks in stage
1: Stage finished
26/02/15 15:16:01 INFO DAGScheduler: Job 1 finished: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56,
took 0.067347 s
ID: 1, Name: Alice, File: ''
ID: 2, Name: Bob, File: ''
ID: 3, Name: Charlie, File: ''
❌ BUG: 3/3 rows have EMPTY file paths!
================================================================================
Testing input_file_block_start() on Iceberg table
================================================================================
=== input_file_block_start() Results ===
26/02/15 15:16:01 INFO V2ScanRelationPushDown:
Output: id#23, name#24
26/02/15 15:16:01 INFO SnapshotScan: Scanning table local.default.test_table
snapshot 8759041077900200141 created at 2026-02-15T12:16:00.623+00:00 with
filter true
26/02/15 15:16:01 INFO BaseDistributedDataScan: Planning file tasks locally
for table local.default.test_table
26/02/15 15:16:01 INFO SparkPartitioningAwareScan: Reporting
UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_6 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 6 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_7 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 7 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 WARN GlutenFallbackReporter: Validation failed for plan:
Project[QueryId=3], due to: fallback input file expression
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.856583 ms
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_8 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 8 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:01 INFO SparkContext: Starting job: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:01 INFO DAGScheduler: Got job 2 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
with 1 output partitions
26/02/15 15:16:01 INFO DAGScheduler: Final stage: ResultStage 2 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
26/02/15 15:16:01 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:01 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:01 INFO DAGScheduler: Submitting ResultStage 2
(MapPartitionsRDD[9] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82),
which has no missing parents
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_9 stored as values in
memory (estimated size 16.7 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes
in memory (estimated size 7.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory
on 192.168.100.32:49662 (size: 7.0 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 9 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:01 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 2 (MapPartitionsRDD[9] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
(first 15 tasks are for partitions Vector(0))
26/02/15 15:16:01 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
resource profile 0
26/02/15 15:16:01 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID
4) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11473 bytes)
26/02/15 15:16:01 INFO Executor: Running task 0.0 in stage 2.0 (TID 4)
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.496625 ms
26/02/15 15:16:01 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4).
7037 bytes result sent to driver
26/02/15 15:16:01 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID
4) in 14 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:01 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool
26/02/15 15:16:01 INFO DAGScheduler: ResultStage 2 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
finished in 0.017 s
26/02/15 15:16:01 INFO DAGScheduler: Job 2 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:01 INFO TaskSchedulerImpl: Killing all running tasks in stage
2: Stage finished
26/02/15 15:16:01 INFO DAGScheduler: Job 2 finished: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82,
took 0.018839 s
ID: 1, Name: Alice, Block Start: -1
ID: 2, Name: Bob, Block Start: -1
ID: 3, Name: Charlie, Block Start: -1
❌ BUG: Some rows have invalid block start positions!
================================================================================
Testing input_file_block_length() on Iceberg table
================================================================================
=== input_file_block_length() Results ===
26/02/15 15:16:01 INFO V2ScanRelationPushDown:
Output: id#39, name#40
26/02/15 15:16:01 INFO SnapshotScan: Scanning table local.default.test_table
snapshot 8759041077900200141 created at 2026-02-15T12:16:00.623+00:00 with
filter true
26/02/15 15:16:01 INFO BaseDistributedDataScan: Planning file tasks locally
for table local.default.test_table
26/02/15 15:16:01 INFO SparkPartitioningAwareScan: Reporting
UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_10 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_10_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 10 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_11 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_11_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 11 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 WARN GlutenFallbackReporter: Validation failed for plan:
Project[QueryId=4], due to: fallback input file expression
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 5.000667 ms
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_12 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_12_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 12 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:01 INFO SparkContext: Starting job: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:01 INFO DAGScheduler: Got job 3 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
with 1 output partitions
26/02/15 15:16:01 INFO DAGScheduler: Final stage: ResultStage 3 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
26/02/15 15:16:01 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:01 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:01 INFO DAGScheduler: Submitting ResultStage 3
(MapPartitionsRDD[13] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106),
which has no missing parents
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_13 stored as values in
memory (estimated size 16.7 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_13_piece0 stored as
bytes in memory (estimated size 7.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory
on 192.168.100.32:49662 (size: 7.0 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 13 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:01 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 3 (MapPartitionsRDD[13] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
(first 15 tasks are for partitions Vector(0))
26/02/15 15:16:01 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
resource profile 0
26/02/15 15:16:01 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
5) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11473 bytes)
26/02/15 15:16:01 INFO Executor: Running task 0.0 in stage 3.0 (TID 5)
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_9_piece0 on
192.168.100.32:49662 in memory (size: 7.0 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.27675 ms
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_3_piece0 on
192.168.100.32:49662 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
192.168.100.32:49662 in memory (size: 4.4 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_11_piece0 on
192.168.100.32:49662 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO Executor: Finished task 0.0 in stage 3.0 (TID 5).
7037 bytes result sent to driver
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_7_piece0 on
192.168.100.32:49662 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
5) in 12 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:01 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
have all completed, from pool
26/02/15 15:16:01 INFO DAGScheduler: ResultStage 3 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
finished in 0.020 s
26/02/15 15:16:01 INFO DAGScheduler: Job 3 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:01 INFO TaskSchedulerImpl: Killing all running tasks in stage
3: Stage finished
26/02/15 15:16:01 INFO DAGScheduler: Job 3 finished: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106,
took 0.020872 s
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
192.168.100.32:49662 in memory (size: 29.7 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_5_piece0 on
192.168.100.32:49662 in memory (size: 7.0 KiB, free: 2.4 GiB)
ID: 1, Name: Alice, Block Length: -1
ID: 2, Name: Bob, Block Length: -1
ID: 3, Name: Charlie, Block Length: -1
❌ BUG: Some rows have invalid block lengths!
================================================================================
Testing all three metadata functions together
================================================================================
=== All Metadata Functions Results ===
26/02/15 15:16:01 INFO V2ScanRelationPushDown:
Output: id#57, name#58
26/02/15 15:16:01 INFO SnapshotScan: Scanning table local.default.test_table
snapshot 8759041077900200141 created at 2026-02-15T12:16:00.623+00:00 with
filter true
26/02/15 15:16:01 INFO BaseDistributedDataScan: Planning file tasks locally
for table local.default.test_table
26/02/15 15:16:01 INFO SparkPartitioningAwareScan: Reporting
UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_14 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_14_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 14 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_15 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_15_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 15 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 WARN GlutenFallbackReporter: Validation failed for plan:
Project[QueryId=5], due to: fallback input file expression
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 7.722917 ms
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_16 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_16_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory
on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 16 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:01 INFO SparkContext: Starting job: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:01 INFO DAGScheduler: Got job 4 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
with 1 output partitions
26/02/15 15:16:01 INFO DAGScheduler: Final stage: ResultStage 4 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
26/02/15 15:16:01 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:01 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:01 INFO DAGScheduler: Submitting ResultStage 4
(MapPartitionsRDD[17] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132),
which has no missing parents
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_17 stored as values in
memory (estimated size 17.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_17_piece0 stored as
bytes in memory (estimated size 7.1 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory
on 192.168.100.32:49662 (size: 7.1 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 17 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:01 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 4 (MapPartitionsRDD[17] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
(first 15 tasks are for partitions Vector(0))
26/02/15 15:16:01 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks
resource profile 0
26/02/15 15:16:01 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID
6) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11473 bytes)
26/02/15 15:16:01 INFO Executor: Running task 0.0 in stage 4.0 (TID 6)
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.799166 ms
26/02/15 15:16:01 INFO Executor: Finished task 0.0 in stage 4.0 (TID 6).
7049 bytes result sent to driver
26/02/15 15:16:01 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID
6) in 23 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:01 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks
have all completed, from pool
26/02/15 15:16:01 INFO DAGScheduler: ResultStage 4 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
finished in 0.027 s
26/02/15 15:16:01 INFO DAGScheduler: Job 4 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:01 INFO TaskSchedulerImpl: Killing all running tasks in stage
4: Stage finished
26/02/15 15:16:01 INFO DAGScheduler: Job 4 finished: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132,
took 0.029916 s
ID: 1, Name: Alice
File: ''
Block Start: -1
Block Length: -1
ID: 2, Name: Bob
File: ''
Block Start: -1
Block Length: -1
ID: 3, Name: Charlie
File: ''
Block Start: -1
Block Length: -1
❌ SOME TESTS FAILED: Check the output above for details
26/02/15 15:16:01 INFO SparkContext: SparkContext is stopping with exitCode
0.
26/02/15 15:16:01 INFO SparkUI: Stopped Spark web UI at
http://192.168.100.32:4040
26/02/15 15:16:01 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
26/02/15 15:16:01 INFO MemoryStore: MemoryStore cleared
26/02/15 15:16:01 INFO BlockManager: BlockManager stopped
26/02/15 15:16:01 INFO BlockManagerMaster: BlockManagerMaster stopped
26/02/15 15:16:01 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
26/02/15 15:16:01 INFO SparkContext: Successfully stopped SparkContext
26/02/15 15:16:02 INFO ShutdownHookManager: Shutdown hook called
26/02/15 15:16:02 INFO ShutdownHookManager: Deleting directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-63dfb5d6-fcec-4e7f-a9a3-af9007b62490
26/02/15 15:16:02 INFO ShutdownHookManager: Deleting directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-d0203fd8-92c9-42cb-a2b3-e3437e2b4a37
26/02/15 15:16:02 INFO ShutdownHookManager: Deleting directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-63dfb5d6-fcec-4e7f-a9a3-af9007b62490/pyspark-11640bfd-7a71-41d6-8134-9eca7086ac34
[Process completed]
```
### After
```
Last login: Sun Feb 15 15:15:06 on ttys072
(3.13.3) ➜ incubator-gluten git:(main) ✗ export
GLUTEN_JAR=/Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar
export ICEBERG_JAR=/tmp/iceberg.jar
spark-submit \
--jars "$GLUTEN_JAR,$ICEBERG_JAR" \
--conf spark.plugins=org.apache.gluten.GlutenPlugin \
--conf spark.gluten.sql.columnar.backend.lib=velox \
--conf spark.gluten.enabled=true \
--conf spark.driver.extraClassPath="$GLUTEN_JAR:$ICEBERG_JAR" \
--conf spark.executor.extraClassPath="$GLUTEN_JAR:$ICEBERG_JAR" \
test_iceberg_simple.py
26/02/15 15:16:15 WARN Utils: Your hostname, Reemas-MacBook-Pro.local
resolves to a loopback address: 127.0.0.1; using 192.168.100.32 instead (on
interface en0)
26/02/15 15:16:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
26/02/15 15:16:16 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
26/02/15 15:16:16 INFO SparkContext: Running Spark version 3.5.5
26/02/15 15:16:16 INFO SparkContext: OS info Mac OS X, 15.6, aarch64
26/02/15 15:16:16 INFO SparkContext: Java version 17.0.18
26/02/15 15:16:16 INFO ResourceUtils:
==============================================================
26/02/15 15:16:16 INFO ResourceUtils: No custom resources configured for
spark.driver.
26/02/15 15:16:16 INFO ResourceUtils:
==============================================================
26/02/15 15:16:16 INFO SparkContext: Submitted application:
iceberg-input-file-metadata-test
26/02/15 15:16:16 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: ,
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name:
offHeap, amount: 2048, script: , vendor: ), task resources: Map(cpus -> name:
cpus, amount: 1.0)
26/02/15 15:16:16 INFO ResourceProfile: Limiting resource is cpu
26/02/15 15:16:16 INFO ResourceProfileManager: Added ResourceProfile id: 0
26/02/15 15:16:16 INFO SecurityManager: Changing view acls to: reema
26/02/15 15:16:16 INFO SecurityManager: Changing modify acls to: reema
26/02/15 15:16:16 INFO SecurityManager: Changing view acls groups to:
26/02/15 15:16:16 INFO SecurityManager: Changing modify acls groups to:
26/02/15 15:16:16 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: reema; groups with
view permissions: EMPTY; users with modify permissions: reema; groups with
modify permissions: EMPTY
26/02/15 15:16:17 INFO Utils: Successfully started service 'sparkDriver' on
port 49683.
26/02/15 15:16:17 INFO SparkEnv: Registering MapOutputTracker
26/02/15 15:16:17 INFO SparkEnv: Registering BlockManagerMaster
26/02/15 15:16:17 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
26/02/15 15:16:17 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
26/02/15 15:16:17 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
26/02/15 15:16:17 INFO DiskBlockManager: Created local directory at
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/blockmgr-28fb4afe-55ae-4ae3-b6bb-b9ce02a8e490
26/02/15 15:16:17 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:17 INFO SparkEnv: Registering OutputCommitCoordinator
26/02/15 15:16:17 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
26/02/15 15:16:17 INFO Utils: Successfully started service 'SparkUI' on port
4040.
26/02/15 15:16:17 INFO SparkContext: Added JAR
file:///Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar
at
spark://192.168.100.32:49683/jars/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar
with timestamp 1771157776861
26/02/15 15:16:17 INFO SparkContext: Added JAR
file:///private/tmp/iceberg.jar at
spark://192.168.100.32:49683/jars/iceberg.jar with timestamp 1771157776861
26/02/15 15:16:17 INFO Discovery: Start discovering components in the
current classpath...
26/02/15 15:16:17 INFO Discovery: Discovered component files:
org.apache.gluten.backendsapi.velox.VeloxBackend,
org.apache.gluten.component.VeloxIcebergComponent. Duration: 4 ms.
26/02/15 15:16:17 INFO package: Components registered within order: velox,
velox-iceberg
26/02/15 15:16:17 INFO GlutenDriverPlugin: Gluten components:
==============================================================
Component velox
velox_branch = HEAD
velox_revision = f247a8e922c4802fd9b9cf7a626421bff9b803fd
velox_revisionTime = 2026-02-07 14:11:45 +0000
Component velox-iceberg
==============================================================
26/02/15 15:16:17 INFO SubstraitBackend: Gluten build info:
==============================================================
Gluten Version: 1.7.0-SNAPSHOT
GCC Version:
Java Version: 17
Scala Version: 2.12.15
Spark Version: 3.5.5
Hadoop Version: 2.7.4
Gluten Branch: iceberg-input-file
Gluten Revision: bdb1f9117dc415d0c42c89fbd5533844bfa17b85
Gluten Revision Time: 2026-02-15 01:29:56 +0300
Gluten Build Time: 2026-02-15T11:32:58Z
Gluten Repo URL: https://github.com/ReemaAlzaid/incubator-gluten.git
==============================================================
26/02/15 15:16:17 INFO VeloxListenerApi: Memory overhead is not set. Setting
it to 644245094 automatically. Gluten doesn't follow Spark's calculation on
default value of this option because the actual required memory overhead will
depend on off-heap usage than on on-heap usage.
26/02/15 15:16:17 INFO SparkDirectoryUtil: Created local directory at
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e
26/02/15 15:16:17 INFO JniWorkspace: Creating JNI workspace in root
directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27
26/02/15 15:16:17 INFO JniWorkspace: JNI workspace
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019
created in root directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27
26/02/15 15:16:17 INFO JniLibLoader: Read real path
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libgluten.dylib
for libPath
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libgluten.dylib
26/02/15 15:16:17 INFO JniLibLoader: Library
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libgluten.dylib
has been loaded using path-loading method
26/02/15 15:16:17 INFO JniLibLoader: Library
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libgluten.dylib
has been loaded
26/02/15 15:16:17 INFO JniLibLoader: Successfully loaded library
darwin/aarch64/libgluten.dylib
26/02/15 15:16:17 INFO JniLibLoader: Read real path
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libvelox.dylib
for libPath
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libvelox.dylib
26/02/15 15:16:17 INFO JniLibLoader: Library
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libvelox.dylib
has been loaded using path-loading method
26/02/15 15:16:17 INFO JniLibLoader: Library
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libvelox.dylib
has been loaded
26/02/15 15:16:17 INFO JniLibLoader: Successfully loaded library
darwin/aarch64/libvelox.dylib
W20260215 15:16:17.696556 14493069 MemoryArbitrator.cpp:84] Query memory
capacity[460.50MB] is set for NOOP arbitrator which has no capacity enforcement
26/02/15 15:16:17 INFO DriverPluginContainer: Initialized driver component
for plugin org.apache.gluten.GlutenPlugin.
26/02/15 15:16:17 INFO Executor: Starting executor ID driver on host
192.168.100.32
26/02/15 15:16:17 INFO Executor: OS info Mac OS X, 15.6, aarch64
26/02/15 15:16:17 INFO Executor: Java version 17.0.18
26/02/15 15:16:17 INFO Executor: Starting executor with user classpath
(userClassPathFirst = false):
'file:/Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar,file:/tmp/iceberg.jar,file:/Users/reema/Desktop/OpenSource/incubator-gluten/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar,file:/Users/reema/Desktop/OpenSource/incubator-gluten/iceberg.jar'
26/02/15 15:16:17 INFO Executor: Created or updated repl class loader
org.apache.spark.util.MutableURLClassLoader@3a7933da for default.
26/02/15 15:16:17 INFO CodedInputStreamClassInitializer: The
defaultRecursionLimit in protobuf has been increased to 100000
26/02/15 15:16:17 INFO VeloxListenerApi: Gluten is running with Spark local
mode. Skip running static initializer for executor.
26/02/15 15:16:17 INFO ExecutorPluginContainer: Initialized executor
component for plugin org.apache.gluten.GlutenPlugin.
26/02/15 15:16:17 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 49684.
26/02/15 15:16:17 INFO NettyBlockTransferService: Server created on
192.168.100.32:49684
26/02/15 15:16:17 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
26/02/15 15:16:17 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, 192.168.100.32, 49684, None)
26/02/15 15:16:17 INFO BlockManagerMasterEndpoint: Registering block manager
192.168.100.32:49684 with 2.4 GiB RAM, BlockManagerId(driver, 192.168.100.32,
49684, None)
26/02/15 15:16:17 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, 192.168.100.32, 49684, None)
26/02/15 15:16:17 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, 192.168.100.32, 49684, None)
26/02/15 15:16:17 INFO VeloxBackend: Gluten SQL Tab has been attached.
26/02/15 15:16:17 INFO SparkShimLoader: Loading Spark Shims for version:
3.5.5
26/02/15 15:16:17 INFO SparkShimLoader: Using Shim provider:
List(org.apache.gluten.sql.shims.spark35.SparkShimProvider@4d028882)
================================================================================
Creating Iceberg table...
================================================================================
26/02/15 15:16:17 INFO SharedState: Setting hive.metastore.warehouse.dir
('null') to the value of spark.sql.warehouse.dir.
26/02/15 15:16:17 INFO SharedState: Warehouse path is
'file:/Users/reema/Desktop/OpenSource/incubator-gluten/spark-warehouse'.
26/02/15 15:16:18 INFO CatalogUtil: Loading custom FileIO implementation:
org.apache.iceberg.hadoop.HadoopFileIO
26/02/15 15:16:18 INFO BaseMetastoreCatalog: Table properties set at catalog
level through catalog properties: {}
26/02/15 15:16:18 INFO BaseMetastoreCatalog: Table properties enforced at
catalog level through catalog properties: {}
26/02/15 15:16:18 INFO HadoopTableOperations: Committed a new metadata file
file:/tmp/iceberg_warehouse/default/test_table/metadata/v1.metadata.json
26/02/15 15:16:19 WARN GlutenFallbackReporter: Validation failed for plan:
AppendData[QueryId=1], due to: [FallbackByBackendSettings] Validation failed on
node AppendData
26/02/15 15:16:19 INFO CodeGenerator: Code generated in 84.452875 ms
26/02/15 15:16:19 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes
in memory (estimated size 29.7 KiB, free 2.4 GiB)
26/02/15 15:16:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on 192.168.100.32:49684 (size: 29.7 KiB, free: 2.4 GiB)
26/02/15 15:16:19 INFO SparkContext: Created broadcast 0 from broadcast at
SparkWrite.java:195
26/02/15 15:16:19 INFO AppendDataExec: Start processing data source write
support: IcebergBatchWrite(table=local.default.test_table, format=PARQUET). The
input RDD has 3 partitions.
26/02/15 15:16:19 INFO SparkContext: Starting job: sql at
NativeMethodAccessorImpl.java:0
26/02/15 15:16:19 INFO DAGScheduler: Got job 0 (sql at
NativeMethodAccessorImpl.java:0) with 3 output partitions
26/02/15 15:16:19 INFO DAGScheduler: Final stage: ResultStage 0 (sql at
NativeMethodAccessorImpl.java:0)
26/02/15 15:16:19 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:19 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:19 INFO DAGScheduler: Submitting ResultStage 0
(MapPartitionsRDD[1] at sql at NativeMethodAccessorImpl.java:0), which has no
missing parents
26/02/15 15:16:19 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 7.8 KiB, free 2.4 GiB)
26/02/15 15:16:19 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes
in memory (estimated size 4.4 KiB, free 2.4 GiB)
26/02/15 15:16:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on 192.168.100.32:49684 (size: 4.4 KiB, free: 2.4 GiB)
26/02/15 15:16:19 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:19 INFO DAGScheduler: Submitting 3 missing tasks from
ResultStage 0 (MapPartitionsRDD[1] at sql at NativeMethodAccessorImpl.java:0)
(first 15 tasks are for partitions Vector(0, 1, 2))
26/02/15 15:16:19 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
resource profile 0
26/02/15 15:16:19 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 9506 bytes)
26/02/15 15:16:19 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
1) (192.168.100.32, executor driver, partition 1, PROCESS_LOCAL, 9506 bytes)
26/02/15 15:16:19 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID
2) (192.168.100.32, executor driver, partition 2, PROCESS_LOCAL, 9506 bytes)
26/02/15 15:16:19 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
26/02/15 15:16:19 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
26/02/15 15:16:19 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
26/02/15 15:16:19 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:19 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:19 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:19 INFO DataWritingSparkTask: Writer for partition 1 is
committing.
26/02/15 15:16:19 INFO DataWritingSparkTask: Writer for partition 0 is
committing.
26/02/15 15:16:19 INFO DataWritingSparkTask: Writer for partition 2 is
committing.
26/02/15 15:16:19 INFO DataWritingSparkTask: Committed partition 0 (task 0,
attempt 0, stage 0.0)
26/02/15 15:16:19 INFO DataWritingSparkTask: Committed partition 2 (task 2,
attempt 0, stage 0.0)
26/02/15 15:16:19 INFO DataWritingSparkTask: Committed partition 1 (task 1,
attempt 0, stage 0.0)
26/02/15 15:16:19 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2).
4161 bytes result sent to driver
26/02/15 15:16:19 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
4157 bytes result sent to driver
26/02/15 15:16:19 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1).
4153 bytes result sent to driver
26/02/15 15:16:19 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID
0) in 401 ms on 192.168.100.32 (executor driver) (1/3)
26/02/15 15:16:19 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID
1) in 393 ms on 192.168.100.32 (executor driver) (2/3)
26/02/15 15:16:19 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID
2) in 393 ms on 192.168.100.32 (executor driver) (3/3)
26/02/15 15:16:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
26/02/15 15:16:19 INFO DAGScheduler: ResultStage 0 (sql at
NativeMethodAccessorImpl.java:0) finished in 0.450 s
26/02/15 15:16:19 INFO DAGScheduler: Job 0 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:19 INFO TaskSchedulerImpl: Killing all running tasks in stage
0: Stage finished
26/02/15 15:16:19 INFO DAGScheduler: Job 0 finished: sql at
NativeMethodAccessorImpl.java:0, took 0.477432 s
26/02/15 15:16:19 INFO AppendDataExec: Data source write support
IcebergBatchWrite(table=local.default.test_table, format=PARQUET) is committing.
26/02/15 15:16:19 INFO SparkWrite: Committing append with 3 new data files
to table local.default.test_table
26/02/15 15:16:20 INFO HadoopTableOperations: Committed a new metadata file
file:/tmp/iceberg_warehouse/default/test_table/metadata/v2.metadata.json
26/02/15 15:16:20 INFO SnapshotProducer: Committed snapshot
7722039398521868759 (MergeAppend)
26/02/15 15:16:20 INFO LoggingMetricsReporter: Received metrics report:
CommitReport{tableName=local.default.test_table,
snapshotId=7722039398521868759, sequenceNumber=1, operation=append,
commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS,
totalDuration=PT0.239004458S, count=1}, attempts=CounterResult{unit=COUNT,
value=1}, addedDataFiles=CounterResult{unit=COUNT, value=3},
removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=3},
addedDeleteFiles=null, addedEqualityDeleteFiles=null,
addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null,
removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null,
removedDVs=null, totalDeleteFiles=CounterResult{unit=COUNT, value=0},
addedRecords=CounterResult{unit=COUNT, value=3}, removedRecords=null,
totalRecords=CounterResult{unit=COUNT, value=3},
addedFilesSizeInBytes=CounterResult{unit=BYTES, value=1920},
removedFilesSizeInBytes=null, totalFilesSizeInBytes=CounterResult{uni
t=BYTES, value=1920}, addedPositionalDeletes=null,
removedPositionalDeletes=null, totalPositionalDeletes=CounterResult{unit=COUNT,
value=0}, addedEqualityDeletes=null, removedEqualityDeletes=null,
totalEqualityDeletes=CounterResult{unit=COUNT, value=0}, manifestsCreated=null,
manifestsReplaced=null, manifestsKept=null, manifestEntriesProcessed=null},
metadata={engine-version=3.5.5, app-id=local-1771157777707, engine-name=spark,
iceberg-version=Apache Iceberg 1.10.0 (commit
2114bf631e49af532d66e2ce148ee49dd1dd1f1f)}}
26/02/15 15:16:20 INFO SparkWrite: Committed in 255 ms
26/02/15 15:16:20 INFO AppendDataExec: Data source write support
IcebergBatchWrite(table=local.default.test_table, format=PARQUET) committed.
================================================================================
Testing input_file_name() on Iceberg table
================================================================================
=== input_file_name() Results ===
26/02/15 15:16:20 INFO V2ScanRelationPushDown:
Output: id#7, name#8
26/02/15 15:16:20 INFO SnapshotScan: Scanning table local.default.test_table
snapshot 7722039398521868759 created at 2026-02-15T12:16:20.033+00:00 with
filter true
26/02/15 15:16:20 INFO BaseDistributedDataScan: Planning file tasks locally
for table local.default.test_table
26/02/15 15:16:20 INFO SparkPartitioningAwareScan: Reporting
UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes
in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 2 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_3 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory
on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 3 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 WARN GlutenFallbackReporter: Validation failed for plan:
Project[QueryId=2], due to: fallback input file expression
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 15.250084 ms
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_4 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes
in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 4 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_3_piece0 on
192.168.100.32:49684 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
192.168.100.32:49684 in memory (size: 4.4 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
192.168.100.32:49684 in memory (size: 29.7 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Starting job: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:20 INFO DAGScheduler: Got job 1 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
with 1 output partitions
26/02/15 15:16:20 INFO DAGScheduler: Final stage: ResultStage 1 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
26/02/15 15:16:20 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:20 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:20 INFO DAGScheduler: Submitting ResultStage 1
(MapPartitionsRDD[8] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56),
which has no missing parents
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_5 stored as values in
memory (estimated size 29.5 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes
in memory (estimated size 11.7 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory
on 192.168.100.32:49684 (size: 11.7 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 5 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:20 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 1 (MapPartitionsRDD[8] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
(first 15 tasks are for partitions Vector(0))
26/02/15 15:16:20 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
resource profile 0
26/02/15 15:16:20 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
3) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11766 bytes)
26/02/15 15:16:20 INFO Executor: Running task 0.0 in stage 1.0 (TID 3)
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 7.91075 ms
26/02/15 15:16:20 INFO BaseAllocator: Debug mode disabled. Enable with the
VM option -Darrow.memory.debug.allocator=true.
26/02/15 15:16:20 INFO DefaultAllocationManagerOption: allocation manager
type not specified, using netty as the default type
26/02/15 15:16:20 INFO CheckAllocator: Using DefaultAllocationManager at
memory/DefaultAllocationManagerFactory.class
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 4.909666 ms
26/02/15 15:16:20 INFO Executor: Finished task 0.0 in stage 1.0 (TID 3).
8350 bytes result sent to driver
26/02/15 15:16:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID
3) in 118 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool
26/02/15 15:16:20 INFO DAGScheduler: ResultStage 1 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
finished in 0.122 s
26/02/15 15:16:20 INFO DAGScheduler: Job 1 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:20 INFO TaskSchedulerImpl: Killing all running tasks in stage
1: Stage finished
26/02/15 15:16:20 INFO DAGScheduler: Job 1 finished: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56,
took 0.123958 s
ID: 1, Name: Alice, File:
'file:/tmp/iceberg_warehouse/default/test_table/data/00000-0-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
ID: 2, Name: Bob, File:
'file:/tmp/iceberg_warehouse/default/test_table/data/00001-1-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
ID: 3, Name: Charlie, File:
'file:/tmp/iceberg_warehouse/default/test_table/data/00002-2-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
✅ SUCCESS: All 3 rows have valid file paths
================================================================================
Testing input_file_block_start() on Iceberg table
================================================================================
=== input_file_block_start() Results ===
26/02/15 15:16:20 INFO V2ScanRelationPushDown:
Output: id#24, name#25
26/02/15 15:16:20 INFO SnapshotScan: Scanning table local.default.test_table
snapshot 7722039398521868759 created at 2026-02-15T12:16:20.033+00:00 with
filter true
26/02/15 15:16:20 INFO BaseDistributedDataScan: Planning file tasks locally
for table local.default.test_table
26/02/15 15:16:20 INFO SparkPartitioningAwareScan: Reporting
UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_6 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory
on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 6 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_7 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes
in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 7 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_5_piece0 on
192.168.100.32:49684 in memory (size: 11.7 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 WARN GlutenFallbackReporter: Validation failed for plan:
Project[QueryId=3], due to: fallback input file expression
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 7.716333 ms
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_8 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes
in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory
on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 8 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:20 INFO SparkContext: Starting job: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:20 INFO DAGScheduler: Got job 2 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
with 1 output partitions
26/02/15 15:16:20 INFO DAGScheduler: Final stage: ResultStage 2 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
26/02/15 15:16:20 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:20 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:20 INFO DAGScheduler: Submitting ResultStage 2
(MapPartitionsRDD[15] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82),
which has no missing parents
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_7_piece0 on
192.168.100.32:49684 in memory (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_9 stored as values in
memory (estimated size 29.7 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes
in memory (estimated size 11.8 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory
on 192.168.100.32:49684 (size: 11.8 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 9 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:20 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 2 (MapPartitionsRDD[15] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
(first 15 tasks are for partitions Vector(0))
26/02/15 15:16:20 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
resource profile 0
26/02/15 15:16:20 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID
4) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11786 bytes)
26/02/15 15:16:20 INFO Executor: Running task 0.0 in stage 2.0 (TID 4)
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 3.413875 ms
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 4.24975 ms
26/02/15 15:16:20 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4).
8212 bytes result sent to driver
26/02/15 15:16:20 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID
4) in 22 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:20 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool
26/02/15 15:16:20 INFO DAGScheduler: ResultStage 2 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
finished in 0.028 s
26/02/15 15:16:20 INFO DAGScheduler: Job 2 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:20 INFO TaskSchedulerImpl: Killing all running tasks in stage
2: Stage finished
26/02/15 15:16:20 INFO DAGScheduler: Job 2 finished: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82,
took 0.035285 s
ID: 1, Name: Alice, Block Start: 4
ID: 2, Name: Bob, Block Start: 4
ID: 3, Name: Charlie, Block Start: 4
✅ SUCCESS: All 3 rows have valid block start positions
================================================================================
Testing input_file_block_length() on Iceberg table
================================================================================
=== input_file_block_length() Results ===
26/02/15 15:16:20 INFO V2ScanRelationPushDown:
Output: id#41, name#42
26/02/15 15:16:20 INFO SnapshotScan: Scanning table local.default.test_table
snapshot 7722039398521868759 created at 2026-02-15T12:16:20.033+00:00 with
filter true
26/02/15 15:16:20 INFO BaseDistributedDataScan: Planning file tasks locally
for table local.default.test_table
26/02/15 15:16:20 INFO SparkPartitioningAwareScan: Reporting
UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_10 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_10_piece0 stored as
bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory
on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 10 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_11 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_11_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory
on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_9_piece0 on
192.168.100.32:49684 in memory (size: 11.8 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 11 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 WARN GlutenFallbackReporter: Validation failed for plan:
Project[QueryId=4], due to: fallback input file expression
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_12 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_12_piece0 stored as
bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory
on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 12 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_11_piece0 on
192.168.100.32:49684 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Starting job: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:20 INFO DAGScheduler: Got job 3 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
with 1 output partitions
26/02/15 15:16:20 INFO DAGScheduler: Final stage: ResultStage 3 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
26/02/15 15:16:20 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:20 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:20 INFO DAGScheduler: Submitting ResultStage 3
(MapPartitionsRDD[22] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106),
which has no missing parents
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_13 stored as values in
memory (estimated size 29.7 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_13_piece0 stored as
bytes in memory (estimated size 11.8 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory
on 192.168.100.32:49684 (size: 11.8 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 13 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:20 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 3 (MapPartitionsRDD[22] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
(first 15 tasks are for partitions Vector(0))
26/02/15 15:16:20 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
resource profile 0
26/02/15 15:16:20 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
5) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11795 bytes)
26/02/15 15:16:20 INFO Executor: Running task 0.0 in stage 3.0 (TID 5)
26/02/15 15:16:20 INFO Executor: Finished task 0.0 in stage 3.0 (TID 5).
8220 bytes result sent to driver
26/02/15 15:16:20 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
5) in 17 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:20 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
have all completed, from pool
26/02/15 15:16:20 INFO DAGScheduler: ResultStage 3 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
finished in 0.019 s
26/02/15 15:16:20 INFO DAGScheduler: Job 3 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:20 INFO TaskSchedulerImpl: Killing all running tasks in stage
3: Stage finished
26/02/15 15:16:20 INFO DAGScheduler: Job 3 finished: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106,
took 0.020702 s
ID: 1, Name: Alice, Block Length: 636
ID: 2, Name: Bob, Block Length: 622
ID: 3, Name: Charlie, Block Length: 650
✅ SUCCESS: All 3 rows have valid block lengths
================================================================================
Testing all three metadata functions together
================================================================================
=== All Metadata Functions Results ===
26/02/15 15:16:20 INFO V2ScanRelationPushDown:
Output: id#60, name#61
26/02/15 15:16:20 INFO SnapshotScan: Scanning table local.default.test_table
snapshot 7722039398521868759 created at 2026-02-15T12:16:20.033+00:00 with
filter true
26/02/15 15:16:20 INFO BaseDistributedDataScan: Planning file tasks locally
for table local.default.test_table
26/02/15 15:16:20 INFO SparkPartitioningAwareScan: Reporting
UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_14 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_14_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory
on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 14 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_15 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_15_piece0 stored as
bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory
on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:21 INFO SparkContext: Created broadcast 15 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 WARN GlutenFallbackReporter: Validation failed for plan:
Project[QueryId=5], due to: fallback input file expression
26/02/15 15:16:21 INFO CodeGenerator: Code generated in 6.058208 ms
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_16 stored as values in
memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_16_piece0 stored as
bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory
on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:21 INFO SparkContext: Created broadcast 16 from collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:21 INFO SparkContext: Starting job: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:21 INFO DAGScheduler: Got job 4 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
with 1 output partitions
26/02/15 15:16:21 INFO DAGScheduler: Final stage: ResultStage 4 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
26/02/15 15:16:21 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:21 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:21 INFO DAGScheduler: Submitting ResultStage 4
(MapPartitionsRDD[29] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132),
which has no missing parents
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_17 stored as values in
memory (estimated size 30.2 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_17_piece0 stored as
bytes in memory (estimated size 12.0 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory
on 192.168.100.32:49684 (size: 12.0 KiB, free: 2.4 GiB)
26/02/15 15:16:21 INFO SparkContext: Created broadcast 17 from broadcast at
DAGScheduler.scala:1585
26/02/15 15:16:21 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 4 (MapPartitionsRDD[29] at collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
(first 15 tasks are for partitions Vector(0))
26/02/15 15:16:21 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks
resource profile 0
26/02/15 15:16:21 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID
6) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11989 bytes)
26/02/15 15:16:21 INFO Executor: Running task 0.0 in stage 4.0 (TID 6)
26/02/15 15:16:21 INFO CodeGenerator: Code generated in 4.262917 ms
26/02/15 15:16:21 INFO CodeGenerator: Code generated in 10.714417 ms
26/02/15 15:16:21 INFO Executor: Finished task 0.0 in stage 4.0 (TID 6).
8369 bytes result sent to driver
26/02/15 15:16:21 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID
6) in 42 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:21 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks
have all completed, from pool
26/02/15 15:16:21 INFO DAGScheduler: ResultStage 4 (collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
finished in 0.047 s
26/02/15 15:16:21 INFO DAGScheduler: Job 4 is finished. Cancelling potential
speculative or zombie tasks for this job
26/02/15 15:16:21 INFO TaskSchedulerImpl: Killing all running tasks in stage
4: Stage finished
26/02/15 15:16:21 INFO DAGScheduler: Job 4 finished: collect at
/Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132,
took 0.050441 s
ID: 1, Name: Alice
File:
'file:/tmp/iceberg_warehouse/default/test_table/data/00000-0-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
Block Start: 4
Block Length: 636
ID: 2, Name: Bob
File:
'file:/tmp/iceberg_warehouse/default/test_table/data/00001-1-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
Block Start: 4
Block Length: 622
ID: 3, Name: Charlie
File:
'file:/tmp/iceberg_warehouse/default/test_table/data/00002-2-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
Block Start: 4
Block Length: 650
✅ ALL TESTS PASSED: All metadata functions work correctly!
26/02/15 15:16:21 INFO SparkContext: SparkContext is stopping with exitCode
0.
26/02/15 15:16:21 INFO SparkUI: Stopped Spark web UI at
http://192.168.100.32:4040
26/02/15 15:16:21 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
26/02/15 15:16:21 INFO MemoryStore: MemoryStore cleared
26/02/15 15:16:21 INFO BlockManager: BlockManager stopped
26/02/15 15:16:21 INFO BlockManagerMaster: BlockManagerMaster stopped
26/02/15 15:16:21 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
26/02/15 15:16:21 INFO SparkContext: Successfully stopped SparkContext
26/02/15 15:16:21 INFO ShutdownHookManager: Shutdown hook called
26/02/15 15:16:21 INFO ShutdownHookManager: Deleting directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-dbc8040a-b030-4017-b078-49a1acd7c001/pyspark-fe5f02ee-61ab-470d-a651-4a4eb74b901e
26/02/15 15:16:21 INFO ShutdownHookManager: Deleting directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-0482ef35-5e29-4648-b2b4-9fc9e31eccac
26/02/15 15:16:21 INFO ShutdownHookManager: Deleting directory
/private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-dbc8040a-b030-4017-b078-49a1acd7c001
[Process completed]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]