Thanks for the information. I will take a look.

Best Regards,
Pulasthi

On Fri, Nov 15, 2019 at 2:07 PM Luke Cwik <[email protected]> wrote:

> They are serialized but not with Java serialization. There is a
> CloudObject serialization[1] layer that only Dataflow uses while all other
> runners who need to serialize are using the Coder -> Proto serialization
> layer[2]. The CloudObject representation is slated for deletion once we can
> migrate Dataflow to the pure proto representation. Feel free to send a PR
> to improve the wording in the Javadoc.
>
> 1:
> https://github.com/apache/beam/blob/5de27d5c0cb86962e28b5168b7f1dec62352230b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/CloudObjects.java#L87
> 2:
> https://github.com/apache/beam/blob/5de27d5c0cb86962e28b5168b7f1dec62352230b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java#L68
>
> On Fri, Nov 15, 2019 at 11:00 AM Pulasthi Supun Wickramasinghe <
> [email protected]> wrote:
>
>> Hi Luke,
>>
>> Aren't the coders supposed to be serializable? The doc on the Coder
>> interface has the following java doc comment, which seems to mean that they
>> should be, and most of the basic coders seem to serializable.
>>
>> " {@link Coder} instances are serialized during job creation and
>> deserialized before use. This will generally be performed by serializing
>> the object via Java Serialization."
>>
>> However "TimestampedValueCoder" does not have a default constructor so it
>> cannot be deserialized.  adding a private non-arg constructor would solve
>> the problem, but this may not be the only class that has this issue. I can
>> work on a pull request to add the private non-args constructors to coders
>> that have them missing if this was not done intentionally. WDYT?
>>
>> Best Regards,
>> Pulasthi
>>
>> On Fri, Nov 15, 2019 at 12:05 AM Pulasthi Supun Wickramasinghe <
>> [email protected]> wrote:
>>
>>> Hi Luke,
>>>
>>> That is the approach i am taking currently to handle the functions. I
>>> Might have to do the same for Coders as well since some coders have the
>>> same issue of not having default constructors.
>>>
>>> I also initially considered converting the pipeline into a JSON format
>>> and sending that over to the workers, Will take a look at the option you
>>> have mentioned since we do plan to implement a Portable pipeline runner for
>>> Twister2 as well. Thanks for the information
>>>
>>> Best Regards,
>>> Pulasthi
>>>
>>>
>>> On Thu, Nov 14, 2019 at 2:30 PM Luke Cwik <[email protected]> wrote:
>>>
>>>> You should create placeholders inside of your Twister2/OpenMPI
>>>> implementation that represent these functions and then instantiate actual
>>>> instances of them on the workers if you want to write your own pipeline
>>>> representation and format for OpenMPI/Twister2.
>>>>
>>>> Or consider converting the pipeline to its proto representation and
>>>> building a portable pipeline runner. This way you could run Go and Python
>>>> pipelines as well. The best example of this is the current Flink
>>>> integration[1]
>>>>
>>>> 1:
>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchPortablePipelineTranslator.java
>>>>
>>>> On Wed, Nov 13, 2019 at 7:44 PM Pulasthi Supun Wickramasinghe <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Dev's
>>>>>
>>>>> Currently, the Pipeline class in Beam is not Serializable. This is not
>>>>> a problem for the current runners since the pipeline is translated and
>>>>> submitted through a centralized Driver like model. However, if the runner
>>>>> has a decentralized model similar to OpenMPI (MPI), which is also the case
>>>>> with Twister2, which I am developing a runner currently, it would have 
>>>>> been
>>>>> better if the pipeline itself was Serializable.
>>>>>
>>>>> Currently, I am trying to transform the Pipeline into a Twister2 graph
>>>>> and then send over to the workers, however since there are some functions
>>>>> such as "SystemReduceFn" that are not serializable this also is somewhat
>>>>> troublesome.
>>>>>
>>>>> Was the decision to make Pipelines not Serializable made due to some
>>>>> specific reason or because all the current use cases did not present any
>>>>> valid requirement to make them Serializable?
>>>>>
>>>>> Best Regards,
>>>>> Pulasthi
>>>>> --
>>>>> Pulasthi S. Wickramasinghe
>>>>> PhD Candidate  | Research Assistant
>>>>> School of Informatics and Computing | Digital Science Center
>>>>> Indiana University, Bloomington
>>>>> cell: 224-386-9035 <(224)%20386-9035>
>>>>>
>>>>
>>>
>>> --
>>> Pulasthi S. Wickramasinghe
>>> PhD Candidate  | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>> cell: 224-386-9035 <(224)%20386-9035>
>>>
>>
>>
>> --
>> Pulasthi S. Wickramasinghe
>> PhD Candidate  | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> cell: 224-386-9035 <(224)%20386-9035>
>>
>

-- 
Pulasthi S. Wickramasinghe
PhD Candidate  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035

Reply via email to