[I] [Bug] Ingest Lineage Entities to Apache Atlas [kyuubi]

via GitHub Wed, 20 Dec 2023 02:20:20 -0800


HisZ opened a new issue, #5883:
URL: https://github.com/apache/kyuubi/issues/5883


   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the bug
   
   On Yarn mode,When I execute two 'insert overwrite' SQL statements in the 
same Beeline session, the two process entity are merged together and so is the 
lineage.
   
   SQL1 expect1:
   
![expect1](https://github.com/apache/kyuubi/assets/36467007/f56fe51d-6562-4830-97d0-4b6a947f3c42)
   
   SQL2 expect2:
   
![expect2](https://github.com/apache/kyuubi/assets/36467007/5402246e-d548-4b73-a74a-2a8d3b2c8b7b)
   
   Actually:
   
![actually](https://github.com/apache/kyuubi/assets/36467007/4d2560a2-c786-41b0-99e2-e09e1764b08c)
   
   They have the same process entity.How do I separate them?
   
   ### Affects Version(s)
   
   1.8
   
   ### Kyuubi Server Log Output
   
   ```logtalk
   without error logs
   ```
   
   
   ### Kyuubi Engine Log Output
   
   ```logtalk
   without error logs
   ```
   
   
   ### Kyuubi Server Configurations
   
   ```yaml
   kyuubi.zookeeper.embedded.client.port                21810
   hive.metastore.uris                          thrift://manager2.bigdata:9083
   
   
   # Spark conf
   spark.sql.adaptive.enabled=true
   spark.sql.adaptive.forceApply=false
   spark.sql.adaptive.logLevel=info
   spark.sql.adaptive.advisoryPartitionSizeInBytes=256m
   spark.sql.adaptive.coalescePartitions.enabled=true
   spark.sql.adaptive.coalescePartitions.minPartitionNum=1
   spark.sql.adaptive.coalescePartitions.initialPartitionNum=50
   spark.sql.adaptive.fetchShuffleBlocksInBatch=true
   spark.sql.adaptive.localShuffleReader.enabled=true
   spark.sql.adaptive.skewJoin.enabled=true
   spark.sql.adaptive.skewJoin.skewedPartitionFactor=5
   spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=400m
   spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0.2
   spark.sql.adaptive.optimizer.excludedRules
   spark.sql.autoBroadcastJoinThreshold=-1
   
   spark.dynamicAllocation.enabled=true
   ##false if prefer shuffle tracking than ESS
   spark.shuffle.service.enabled=true
   spark.dynamicAllocation.initialExecutors=10
   spark.dynamicAllocation.minExecutors=10
   spark.dynamicAllocation.maxExecutors=500
   spark.dynamicAllocation.executorAllocationRatio=0.5
   spark.dynamicAllocation.executorIdleTimeout=60s
   spark.dynamicAllocation.cachedExecutorIdleTimeout=30min
   # true if prefer shuffle tracking than ESS
   spark.dynamicAllocation.shuffleTracking.enabled=false
   spark.dynamicAllocation.shuffleTracking.timeout=30min
   spark.dynamicAllocation.schedulerBacklogTimeout=1s
   spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=1s
   spark.cleaner.periodicGC.interval=5min
   spark.sql.optimizer.insertRepartitionBeforeWriteIfNoShuffle.enabled=true
   
   
   
   # DolphinScheduler
   ___DolphinScheduler___.kyuubi.session.engine.idle.timeout=PT15S
   
   # atlas configuration
   spark.atlas.client.password=admin
   spark.atlas.client.type=rest
   spark.atlas.client.username=admin
   spark.atlas.cluster.name=primary
   spark.atlas.hook.spark.column.lineage.enabled=true
   spark.atlas.rest.address=http://worker5.bigdata:21000
   spark.kyuubi.plugin.lineage.dispatchers=ATLAS
   ```
   
   
   ### Kyuubi Engine Configurations
   
   ```yaml
   spark.atlas.client.password=admin
   spark.atlas.client.type=rest
   spark.atlas.client.username=admin
   spark.atlas.cluster.name=primary
   spark.atlas.hook.spark.column.lineage.enabled=true
   spark.atlas.rest.address=http://worker5.bigdata:21000
   spark.driver.extraJavaOptions=-Dhdp.version=3.0.1.0-187
   
spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
   spark.eventLog.dir=hdfs:///spark3-history
   spark.eventLog.enabled=true
   spark.executor.extraJavaOptions=-XX:+UseNUMA -XX:+UseG1GC
   
spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
   
spark.files=/usr/hdp/3.0.1.0-187/spark3/spark-3.1.1/conf/atlas-application.properties
   spark.history.fs.cleaner.enabled=true
   spark.history.fs.cleaner.interval=14d
   spark.history.fs.cleaner.maxAge=180d
   spark.history.fs.logDirectory=hdfs:///spark3-history/
   spark.history.kerberos.keytab=none
   spark.history.kerberos.principal=none
   spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider
   spark.history.ui.port=28081
   spark.kyuubi.plugin.lineage.dispatchers=ATLAS
   spark.kyuubi.plugin.lineage.skip.parsing.permanent.view.enabled=true
   spark.master=yarn
   
spark.sql.queryExecutionListeners=org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener
   spark.sql.warehouse.dir=/apps/spark/warehouse
   spark.yarn.am.extraJavaOptions=-Dhdp.version=3.0.1.0-187
   spark.yarn.historyServer.address=manager2.bigdata:28081
   spark.yarn.queue=default
   ```
   
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to fix.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Bug] Ingest Lineage Entities to Apache Atlas [kyuubi]

Reply via email to