RushabhK commented on issue #9801: URL: https://github.com/apache/incubator-gluten/issues/9801#issuecomment-2965552203
Hello @FelixYBW I had a question on the file output committers in Gluten. I was using the Manifest committter in one of my jobs with Gluten and in the driver logs I could see that this Manifest committer, file count was 0 in all the `manifest.json` and data size 0 was not doing anything with Gluten. So my question is who renames / moves these files in gluten with Manifest? Does Gluten have its own mechanism for handling commits? These are the driver logs I could see when running with gluten: ``` 25/06/11 15:37:21 [main] INFO PathOutputCommitterFactory: Using OutputCommitter factory class class org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterFactory from key mapreduce.outputcommitter.factory.scheme.gs 25/06/11 15:37:21 [main] INFO ManifestCommitter: Created ManifestCommitter with JobID job_202506111537213028082964935343887_0000, Task Attempt attempt_202506111537213028082964935343887_0000_m_000000_0 and destination gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb 25/06/11 15:48:52 [main] INFO AbstractJobOrTaskStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Executing Stage job_stage_load_manifests 25/06/11 15:48:52 [main] INFO LoadManifestsStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Executing Manifest Job Commit with manifests in gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb/_temporary/fa204b99-986e-4d65-8912-a107afc4105c/00/manifests 25/06/11 15:48:52 [manifest-committer-fa204b99-986e-4d65-8912-a107afc4105c_0-31] INFO LoadManifestsStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Task Attempt attempt_202506111537213403690146387912535_0002_m_000031_1633 file gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb/_temporary/fa204b99-986e-4d65-8912-a107afc4105c/00/manifests/task_202506111537213403690146387912535_0002_m_000031-manifest.json: File count: 0; data size=0 25/06/11 15:48:52 [manifest-committer-fa204b99-986e-4d65-8912-a107afc4105c_0-24] INFO LoadManifestsStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Task Attempt attempt_202506111537213403690146387912535_0002_m_000024_1695 file gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb/_temporary/fa204b99-986e-4d65-8912-a107afc4105c/00/manifests/task_202506111537213403690146387912535_0002_m_000024-manifest.json: File count: 0; data size=0 25/06/11 15:48:52 [manifest-committer-fa204b99-986e-4d65-8912-a107afc4105c_0-16] INFO LoadManifestsStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Task Attempt attempt_202506111537213403690146387912535_0002_m_000016_1683 file gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb/_temporary/fa204b99-986e-4d65-8912-a107afc4105c/00/manifests/task_202506111537213403690146387912535_0002_m_000016-manifest.json: File count: 0; data size=0 25/06/11 15:48:52 [manifest-committer-fa204b99-986e-4d65-8912-a107afc4105c_0-23] INFO LoadManifestsStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Task Attempt attempt_202506111537213403690146387912535_0002_m_000023_1682 file gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb/_temporary/fa204b99-986e-4d65-8912-a107afc4105c/00/manifests/task_202506111537213403690146387912535_0002_m_000023-manifest.json: File count: 0; data size=0 25/06/11 15:48:52 [manifest-committer-fa204b99-986e-4d65-8912-a107afc4105c_0-25] INFO LoadManifestsStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Task Attempt attempt_202506111537213403690146387912535_0002_m_000025_1630 file gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb/_temporary/fa204b99-986e-4d65-8912-a107afc4105c/00/manifests/task_202506111537213403690146387912535_0002_m_000025-manifest.json: File count: 0; data size=0 25/06/11 15:48:52 [manifest-committer-fa204b99-986e-4d65-8912-a107afc4105c_0-26] INFO LoadManifestsStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Task Attempt attempt_202506111537213403690146387912535_0002_m_000026_1636 file gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb/_temporary/fa204b99-986e-4d65-8912-a107afc4105c/00/manifests/task_202506111537213403690146387912535_0002_m_000026-manifest.json: File count: 0; data size=0 25/06/11 15:48:52 [manifest-committer-fa204b99-986e-4d65-8912-a107afc4105c_0-7] INFO LoadManifestsStage: [Job-Attempt fa204b99-986e-4d65-8912-a107afc4105c/00]: Task Attempt attempt_202506111537213403690146387912535_0002_m_000007_1824 file gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb/_temporary/fa204b99-986e-4d65-8912-a107afc4105c/00/manifests/task_202506111537213403690146387912535_0002_m_000007-manifest.json: File count: 0; data size=0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
