Yep, thats a common misunderstanding with beam.

The code that is actually executed in the try block is just for pipeline
construction, and no data is processed at this point in time.

Once the pipeline is constructed, the various pardos are serialized, and
sent to the runners, where they are actually executed.

In this case, if there was an exception in the pardo that converts rows to
avro, you would see the "Exception when converting Beam Row to Avro Record"
log in whatever logs your runner provides you, and the exception would
propagate up to your runner.

In this case, your log log.info("Finished writing Parquet file to path {}",
writePath); is inaccurate, it will log when the pipeline is constructed,
not when the parquet write completes

On Wed, Nov 8, 2023 at 10:51 AM Ramya Prasad via dev <dev@beam.apache.org>
wrote:

> Hey John,
>
> Yes that's how my code is set up, I have the FileIO.write() in its own
> try-catch block. I took a second look at where exactly the code is failing,
> and it's actually in a ParDo function which is happening before I call
> FileIO.write(). But even within that, I've tried adding a try-catch but the
> error isn't stopping the actual application run in a Spark cluster. In the
> cluster, I see that the exception is being thrown from my ParDo, but then
> immediately after that, I see the line* INFO ApplicationMaster: Final app
> status: SUCCEEDED, exitCode: 0. *This is roughly what my code setup looks
> like:
>
> @Slf4j
> public class ParquetWriteActionStrategy {
>
>     public void executeWriteAction(Pipeline p) throws Exception {
>
>         try {
>
>             // transform PCollection from type Row to GenericRecords
>             PCollection<GenericRecord> records = p.apply("transform 
> PCollection from type Row to GenericRecords",
>                     ParDo.of(new DoFn<Row, GenericRecord>() {
>                         @ProcessElement
>                         public void processElement(@Element Row row, 
> OutputReceiver<GenericRecord> out) {
>                             try {
>                                 <convert Row to Avro Record>
>                             } catch (Exception e) {
>                                 log.error("Exception when converting Beam Row 
> to Avro Record: {}", e.getMessage());
>                                 throw e;
>                             }
>
>                         }
>                     })).setCoder(AvroCoder.of(avroSchema));
>             records.apply("Writing Parquet Output File", 
> FileIO.<GenericRecord>
>                             write()
>                     .via(<schema>)
>                     .to(writePath)
>                     .withSuffix(".parquet"));
>
>             log.info("Finished writing Parquet file to path {}", writePath);
>         } catch (Exception e) {
>             log.error("Error in Parquet Write Action. {}", e.getMessage());
>             throw e;
>         }
>
>     }
>
>
> On Wed, Nov 8, 2023 at 9:16 AM John Casey via dev <dev@beam.apache.org>
> wrote:
>
>> There are 2 execution times when using Beam. The first execution is
>> local, when a pipeline is constructed, and the second is remote on the
>> runner, processing data.
>>
>> Based on what you said, it sounds like you are wrapping pipeline
>> construction in a try-catch, and constructing FileIO isn't failing.
>>
>> e.g.
>>
>> try {
>>
>> FileIO.write().someOtherconfigs()
>>
>> } catch ...
>>
>> this will catch any exceptions in constructing fileio, but the running
>> pipeline won't propagate exceptions through this exception block.
>>
>> On Tue, Nov 7, 2023 at 5:21 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> File write failures should be throwing exceptions that will
>>> terminate the pipeline on failure. (Generally a distributed runner will
>>> make multiple attempts before abandoning the entire pipeline of course.)
>>>
>>> Are you seeing files failing to be written but no exceptions being
>>> thrown? If so, this is definitely a bug that we want to resolve.
>>>
>>>
>>> On Tue, Nov 7, 2023 at 11:17 AM Ramya Prasad via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am a developer using Apache Beam in my Java application, and I need
>>>> some help on how to handle exceptions when writing a file to S3. I have
>>>> tried wrapping my code within a try-catch block, but no exception is being
>>>> thrown within the try block. I'm assuming that FileIO doesn't throw any
>>>> exceptions upon failure. Is there a way in which I can either terminate the
>>>> program on failure or at least be made aware of if any of my write
>>>> operations fail?
>>>>
>>>> Thanks and sincerely,
>>>> Ramya
>>>> ------------------------------
>>>>
>>>> The information contained in this e-mail may be confidential and/or
>>>> proprietary to Capital One and/or its affiliates and may only be used
>>>> solely in performance of work or services for Capital One. The information
>>>> transmitted herewith is intended only for use by the individual or entity
>>>> to which it is addressed. If the reader of this message is not the intended
>>>> recipient, you are hereby notified that any review, retransmission,
>>>> dissemination, distribution, copying or other use of, or taking of any
>>>> action in reliance upon this information is strictly prohibited. If you
>>>> have received this communication in error, please contact the sender and
>>>> delete the material from your computer.
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>
> The information contained in this e-mail may be confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>

Reply via email to