[GitHub] [iceberg] ddsr-ops opened a new issue #2447: Spark: can not read rows inserted by flink-sql in spark-sql

GitBox Fri, 09 Apr 2021 03:18:23 -0700


ddsr-ops opened a new issue #2447:
URL: https://github.com/apache/iceberg/issues/2447



   # environment:
   hadoop version: Hadoop 3.0.0-cdh6.2.1
   hive version: 2.3.7
   spark version: spark 3.0.1
   flink version: flink 1.11.3
   iceberg: 0.11.0
   
   # steps:
   
   1. create a iceberg in spark-sql
   
   `./spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.0     
--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
     --conf 
spark.sql.catalog.hadoop_catalog=org.apache.iceberg.spark.SparkCatalog     
--conf spark.sql.catalog.hadoop_catalog.type=hadoop     --conf 
spark.sql.catalog.hadoop_catalog.warehouse=hdfs://xxx:8020/hdp_data/hive/warehouse
      --master yarn`
   
   ```
   spark-sql> create table isolation(id int, msg string) using iceberg;
   Time taken: 0.15 seconds
   21/04/09 17:56:06 INFO SparkSQLCLIDriver: Time taken: 0.15 seconds
   spark-sql> desc formatted isolation;
   21/04/09 17:56:18 INFO CodeGenerator: Code generated in 18.984 ms
   21/04/09 17:56:18 INFO CodeGenerator: Code generated in 7.674588 ms
   id   int     
   msg  string  
                
   Partitioning         
   Not partitioned              
                
   Detailed Table Information           
   Name hadoop_catalog.logging.isolation        
   Provider     iceberg 
   Owner        hive    
   Table Properties     [current-snapshot-id=none,format=iceberg/parquet]       
   Time taken: 0.113 seconds, Fetched 11 row(s)
   21/04/09 17:56:18 INFO SparkSQLCLIDriver: Time taken: 0.113 seconds, Fetched 
11 row(s)
   
   spark-sql> select count(*) from isolation;
   
   0
   Time taken: 1.536 seconds, Fetched 1 row(s)
   21/04/09 18:00:22 INFO SparkSQLCLIDriver: Time taken: 1.536 seconds, Fetched 
1 row(s)
   ```
   2. insert a row into `isolation` table above in flink-sql
   ```
   ./bin/sql-client.sh embedded     -j ext_lib/iceberg-flink-runtime-0.11.0.jar 
    -j ext_lib/flink-sql-connector-hive-2.3.6_2.11-1.11.3.jar Shell
   Flink SQL> CREATE CATALOG hadoop_catalog WITH (
   >   'type'='iceberg',
   >   'catalog-type'='hadoop',
   >   'warehouse'='hdfs://hadoop189:8020/hdp_data/hive/warehouse',
   >   'property-version'='1'
   > );
   [INFO] Catalog has been created.
   
   Flink SQL> use catalog hadoop_catalog;
   
   Flink SQL> use logging;
   
   Flink SQL> show tables;
   isolation
   
   
   Flink SQL> select count(*) from isolation;   -- return 0
   [INFO] Result retrieval cancelled.
   
   Flink SQL> insert into isolation values(1, 'flink');
   [INFO] Submitting SQL update statement to the cluster...
   [INFO] Table update statement has been successfully submitted to the cluster:
   Job ID: c095a1649350e007a4670710061b5536
   
   
   Flink SQL> select * from isolation;
   [INFO] Result retrieval cancelled.
                           id                       msg
                            1                     flink
   ```
   
   3. query `isolation` table in spark-sql, but no result returned.
   ```
   spark-sql> select * from isolation;
   21/04/09 18:08:22 INFO MemoryStore: Block broadcast_7 stored as values in 
memory (estimated size 629.5 KiB, free 363.7 MiB)
   21/04/09 18:08:22 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes 
in memory (estimated size 53.9 KiB, free 363.6 MiB)
   21/04/09 18:08:22 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory 
on hadoop189:37478 (size: 53.9 KiB, free: 366.1 MiB)
   21/04/09 18:08:22 INFO SparkContext: Created broadcast 7 from broadcast at 
SparkScanBuilder.java:171
   21/04/09 18:08:22 INFO MemoryStore: Block broadcast_8 stored as values in 
memory (estimated size 40.0 B, free 363.6 MiB)
   21/04/09 18:08:22 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes 
in memory (estimated size 116.0 B, free 363.6 MiB)
   21/04/09 18:08:22 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory 
on hadoop189:37478 (size: 116.0 B, free: 366.1 MiB)
   21/04/09 18:08:22 INFO SparkContext: Created broadcast 8 from broadcast at 
SparkScanBuilder.java:172
   21/04/09 18:08:22 INFO V2ScanRelationPushDown: 
   Pushing operators to hadoop_catalog.logging.isolation
   Pushed Filters: 
   Post-Scan Filters: 
   Output: id#165, msg#166
            
   21/04/09 18:08:22 INFO BaseTableScan: Scanning empty table 
hadoop_catalog.logging.isolation
   21/04/09 18:08:22 INFO SparkContext: Starting job: main at 
NativeMethodAccessorImpl.java:0
   21/04/09 18:08:22 INFO DAGScheduler: Job 3 finished: main at 
NativeMethodAccessorImpl.java:0, took 0.000113 s
   Time taken: 0.059 seconds
   21/04/09 18:08:22 INFO SparkSQLCLIDriver: Time taken: 0.059 seconds
   ```
   4. i insert a row into `isolation` table in spark-sql, and query again, the 
result returned is right.
   ```
   spark-sql> insert into isolation values(1, 'spark');
   21/04/09 18:12:28 INFO MemoryStore: Block broadcast_9 stored as values in 
memory (estimated size 629.5 KiB, free 363.0 MiB)
   21/04/09 18:12:28 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes 
in memory (estimated size 53.9 KiB, free 362.9 MiB)
   21/04/09 18:12:28 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory 
on hadoop189:37478 (size: 53.9 KiB, free: 366.0 MiB)
   21/04/09 18:12:28 INFO SparkContext: Created broadcast 9 from broadcast at 
SparkWriteBuilder.java:131
   21/04/09 18:12:28 INFO MemoryStore: Block broadcast_10 stored as values in 
memory (estimated size 40.0 B, free 362.9 MiB)
   21/04/09 18:12:28 INFO MemoryStore: Block broadcast_10_piece0 stored as 
bytes in memory (estimated size 116.0 B, free 362.9 MiB)
   21/04/09 18:12:28 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory 
on hadoop189:37478 (size: 116.0 B, free: 366.0 MiB)
   21/04/09 18:12:28 INFO SparkContext: Created broadcast 10 from broadcast at 
SparkWriteBuilder.java:132
   21/04/09 18:12:28 INFO CodeGenerator: Code generated in 6.157067 ms
   21/04/09 18:12:28 INFO AppendDataExec: Start processing data source write 
support: IcebergBatchWrite(table=hadoop_catalog.logging.isolation, 
format=PARQUET). The input RDD has 1 partitions.
   21/04/09 18:12:28 INFO SparkContext: Starting job: main at 
NativeMethodAccessorImpl.java:0
   21/04/09 18:12:28 INFO DAGScheduler: Got job 4 (main at 
NativeMethodAccessorImpl.java:0) with 1 output partitions
   21/04/09 18:12:28 INFO DAGScheduler: Final stage: ResultStage 2 (main at 
NativeMethodAccessorImpl.java:0)
   21/04/09 18:12:28 INFO DAGScheduler: Parents of final stage: List()
   21/04/09 18:12:28 INFO DAGScheduler: Missing parents: List()
   21/04/09 18:12:28 INFO DAGScheduler: Submitting ResultStage 2 
(MapPartitionsRDD[20] at main at NativeMethodAccessorImpl.java:0), which has no 
missing parents
   21/04/09 18:12:28 INFO MemoryStore: Block broadcast_11 stored as values in 
memory (estimated size 7.2 KiB, free 362.9 MiB)
   21/04/09 18:12:28 INFO MemoryStore: Block broadcast_11_piece0 stored as 
bytes in memory (estimated size 4.0 KiB, free 362.9 MiB)
   21/04/09 18:12:28 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
on hadoop189:37478 (size: 4.0 KiB, free: 366.0 MiB)
   21/04/09 18:12:28 INFO SparkContext: Created broadcast 11 from broadcast at 
DAGScheduler.scala:1223
   21/04/09 18:12:28 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 2 (MapPartitionsRDD[20] at main at NativeMethodAccessorImpl.java:0) 
(first 15 tasks are for partitions Vector(0))
   21/04/09 18:12:28 INFO YarnScheduler: Adding task set 2.0 with 1 tasks
   21/04/09 18:12:28 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 
1, hadoop189, executor 2, partition 0, PROCESS_LOCAL, 7552 bytes)
   21/04/09 18:12:28 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
on hadoop189:44388 (size: 4.0 KiB, free: 366.3 MiB)
   21/04/09 18:12:28 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory 
on hadoop189:44388 (size: 53.9 KiB, free: 366.2 MiB)
   21/04/09 18:12:29 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory 
on hadoop189:44388 (size: 116.0 B, free: 366.2 MiB)
   21/04/09 18:12:30 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 
1) in 1793 ms on hadoop189 (executor 2) (1/1)
   21/04/09 18:12:30 INFO YarnScheduler: Removed TaskSet 2.0, whose tasks have 
all completed, from pool 
   21/04/09 18:12:30 INFO DAGScheduler: ResultStage 2 (main at 
NativeMethodAccessorImpl.java:0) finished in 1.802 s
   21/04/09 18:12:30 INFO DAGScheduler: Job 4 is finished. Cancelling potential 
speculative or zombie tasks for this job
   21/04/09 18:12:30 INFO YarnScheduler: Killing all running tasks in stage 2: 
Stage finished
   21/04/09 18:12:30 INFO DAGScheduler: Job 4 finished: main at 
NativeMethodAccessorImpl.java:0, took 1.809582 s
   21/04/09 18:12:30 INFO AppendDataExec: Data source write support 
IcebergBatchWrite(table=hadoop_catalog.logging.isolation, format=PARQUET) is 
committing.
   21/04/09 18:12:30 INFO SparkWrite: Committing append with 1 new data files 
to table hadoop_catalog.logging.isolation
   21/04/09 18:12:30 INFO SnapshotProducer: Committed snapshot 
778572246949378127 (MergeAppend)
   21/04/09 18:12:30 INFO SparkWrite: Committed in 206 ms
   21/04/09 18:12:30 INFO AppendDataExec: Data source write support 
IcebergBatchWrite(table=hadoop_catalog.logging.isolation, format=PARQUET) 
committed.
   Time taken: 2.215 seconds
   21/04/09 18:12:30 INFO SparkSQLCLIDriver: Time taken: 2.215 seconds
   spark-sql> 21/04/09 18:12:30 INFO BlockManagerInfo: Removed 
broadcast_3_piece0 on hadoop189:37478 in memory (size: 116.0 B, free: 366.0 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_6_piece0 on 
hadoop189:37478 in memory (size: 116.0 B, free: 366.0 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_11_piece0 on 
hadoop189:37478 in memory (size: 4.0 KiB, free: 366.0 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_11_piece0 on 
hadoop189:44388 in memory (size: 4.0 KiB, free: 366.2 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
hadoop189:37478 in memory (size: 116.0 B, free: 366.0 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_4_piece0 on 
hadoop189:37478 in memory (size: 5.0 KiB, free: 366.0 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_4_piece0 on 
hadoop189:44388 in memory (size: 5.0 KiB, free: 366.2 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 
hadoop189:37478 in memory (size: 53.9 KiB, free: 366.1 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_9_piece0 on 
hadoop189:37478 in memory (size: 53.9 KiB, free: 366.1 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_9_piece0 on 
hadoop189:44388 in memory (size: 53.9 KiB, free: 366.3 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_8_piece0 on 
hadoop189:37478 in memory (size: 116.0 B, free: 366.1 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 
hadoop189:37478 in memory (size: 53.9 KiB, free: 366.2 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_10_piece0 on 
hadoop189:37478 in memory (size: 116.0 B, free: 366.2 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_10_piece0 on 
hadoop189:44388 in memory (size: 116.0 B, free: 366.3 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_5_piece0 on 
hadoop189:37478 in memory (size: 53.9 KiB, free: 366.2 MiB)
   21/04/09 18:12:30 INFO BlockManagerInfo: Removed broadcast_7_piece0 on 
hadoop189:37478 in memory (size: 53.9 KiB, free: 366.3 MiB)
   
            > select * from isolation;
   21/04/09 18:12:44 INFO MemoryStore: Block broadcast_12 stored as values in 
memory (estimated size 629.5 KiB, free 365.7 MiB)
   21/04/09 18:12:44 INFO MemoryStore: Block broadcast_12_piece0 stored as 
bytes in memory (estimated size 53.9 KiB, free 365.6 MiB)
   21/04/09 18:12:44 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory 
on hadoop189:37478 (size: 53.9 KiB, free: 366.2 MiB)
   21/04/09 18:12:44 INFO SparkContext: Created broadcast 12 from broadcast at 
SparkScanBuilder.java:171
   21/04/09 18:12:44 INFO MemoryStore: Block broadcast_13 stored as values in 
memory (estimated size 40.0 B, free 365.6 MiB)
   21/04/09 18:12:44 INFO MemoryStore: Block broadcast_13_piece0 stored as 
bytes in memory (estimated size 116.0 B, free 365.6 MiB)
   21/04/09 18:12:44 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory 
on hadoop189:37478 (size: 116.0 B, free: 366.2 MiB)
   21/04/09 18:12:44 INFO SparkContext: Created broadcast 13 from broadcast at 
SparkScanBuilder.java:172
   21/04/09 18:12:44 INFO V2ScanRelationPushDown: 
   Pushing operators to hadoop_catalog.logging.isolation
   Pushed Filters: 
   Post-Scan Filters: 
   Output: id#177, msg#178
            
   21/04/09 18:12:45 INFO BaseTableScan: Scanning table 
hadoop_catalog.logging.isolation snapshot 778572246949378127 created at 
2021-04-09 18:12:30.771 with filter true
   21/04/09 18:12:45 INFO SparkContext: Starting job: main at 
NativeMethodAccessorImpl.java:0
   21/04/09 18:12:45 INFO DAGScheduler: Got job 5 (main at 
NativeMethodAccessorImpl.java:0) with 1 output partitions
   21/04/09 18:12:45 INFO DAGScheduler: Final stage: ResultStage 3 (main at 
NativeMethodAccessorImpl.java:0)
   21/04/09 18:12:45 INFO DAGScheduler: Parents of final stage: List()
   21/04/09 18:12:45 INFO DAGScheduler: Missing parents: List()
   21/04/09 18:12:45 INFO DAGScheduler: Submitting ResultStage 3 
(MapPartitionsRDD[24] at main at NativeMethodAccessorImpl.java:0), which has no 
missing parents
   21/04/09 18:12:45 INFO MemoryStore: Block broadcast_14 stored as values in 
memory (estimated size 7.4 KiB, free 365.6 MiB)
   21/04/09 18:12:45 INFO MemoryStore: Block broadcast_14_piece0 stored as 
bytes in memory (estimated size 3.6 KiB, free 365.6 MiB)
   21/04/09 18:12:45 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory 
on hadoop189:37478 (size: 3.6 KiB, free: 366.2 MiB)
   21/04/09 18:12:45 INFO SparkContext: Created broadcast 14 from broadcast at 
DAGScheduler.scala:1223
   21/04/09 18:12:45 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 3 (MapPartitionsRDD[24] at main at NativeMethodAccessorImpl.java:0) 
(first 15 tasks are for partitions Vector(0))
   21/04/09 18:12:45 INFO YarnScheduler: Adding task set 3.0 with 1 tasks
   21/04/09 18:12:45 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 
2, hadoop189, executor 1, partition 0, NODE_LOCAL, 11942 bytes)
   21/04/09 18:12:45 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory 
on hadoop189:40069 (size: 3.6 KiB, free: 366.3 MiB)
   21/04/09 18:12:46 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory 
on hadoop189:40069 (size: 53.9 KiB, free: 366.2 MiB)
   21/04/09 18:12:46 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory 
on hadoop189:40069 (size: 116.0 B, free: 366.2 MiB)
   21/04/09 18:12:47 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 
2) in 2644 ms on hadoop189 (executor 1) (1/1)
   21/04/09 18:12:47 INFO YarnScheduler: Removed TaskSet 3.0, whose tasks have 
all completed, from pool 
   21/04/09 18:12:47 INFO DAGScheduler: ResultStage 3 (main at 
NativeMethodAccessorImpl.java:0) finished in 2.683 s
   21/04/09 18:12:47 INFO DAGScheduler: Job 5 is finished. Cancelling potential 
speculative or zombie tasks for this job
   21/04/09 18:12:47 INFO YarnScheduler: Killing all running tasks in stage 3: 
Stage finished
   21/04/09 18:12:47 INFO DAGScheduler: Job 5 finished: main at 
NativeMethodAccessorImpl.java:0, took 2.687406 s
   1    spark
   1    flink
   Time taken: 2.826 seconds, Fetched 2 row(s)
   21/04/09 18:12:47 INFO SparkSQLCLIDriver: Time taken: 2.826 seconds, Fetched 
2 row(s)
   ```
   
   I want to know why, please help, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ddsr-ops opened a new issue #2447: Spark: can not read rows inserted by flink-sql in spark-sql

Reply via email to