[jira] [Commented] (PHOENIX-6721) CSV bulkload tool fails with FileNotFoundException if --output points to the S3 location

ASF GitHub Bot (Jira) Thu, 07 Dec 2023 23:35:13 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794559#comment-17794559
 ]


ASF GitHub Bot commented on PHOENIX-6721:
-----------------------------------------

stoty commented on code in PR #1450:
URL: https://github.com/apache/phoenix/pull/1450#discussion_r1420042762


##########
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java:
##########
@@ -122,11 +123,11 @@ public RecordWriter<TableRowkeyPair, Cell> 
getRecordWriter(TaskAttemptContext co
      * @return
      * @throws IOException 
      */
-    static <V extends Cell> RecordWriter<TableRowkeyPair, V> 
createRecordWriter(final TaskAttemptContext context)
+    static <V extends Cell> RecordWriter<TableRowkeyPair, V> 
createRecordWriter(
+        final TaskAttemptContext context, final OutputCommitter committer)
             throws IOException {
         // Get the path of the temporary output file
-        final Path outputPath = FileOutputFormat.getOutputPath(context);
-        final Path outputdir = new FileOutputCommitter(outputPath, 
context).getWorkPath();
+        final Path outputdir = ((PathOutputCommitter) 
committer).getOutputPath();

Review Comment:
   @ss77892 
   
   This indeed looks incorrect.
   We should use .getWorkPath() here.
   
   This works for the Magic S3a committer where the work and output path are 
the same, but for FileCommitter (i.e. HDFS) and others this breaks the commit 
mechanism by writing into the output directory directory directly.
   





> CSV bulkload tool fails with FileNotFoundException if --output points to the 
> S3 location
> ----------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6721
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6721
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>            Reporter: Sergey Soldatov
>            Assignee: Sergey Soldatov
>            Priority: Major
>
> We were trying to use CSV bulkload tool with the HBase/Phoenix running on top 
> of AWS S3 and found that once we use --output params pointing to  S3, the job 
> fails with FNFE



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-6721) CSV bulkload tool fails with FileNotFoundException if --output points to the S3 location

Reply via email to