[
https://issues.apache.org/jira/browse/BEAM-6276?focusedWorklogId=178318&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-178318
]
ASF GitHub Bot logged work on BEAM-6276:
----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Dec/18 11:28
Start Date: 23/Dec/18 11:28
Worklog Time Spent: 10m
Work Description: kanterov commented on issue #7331: [BEAM-6276] Fix
performance regression.
URL: https://github.com/apache/beam/pull/7331#issuecomment-449630174
@reuvenlax agree that it isn't a problem for graph-construction. I didn't
elaborate properly, but the reason to have deterministic schema is to pass it
to a method like:
```
class FromRowUsingCreatorGenerator {
public static <T> FromRowUsingCreator<T> generate(Class<T> clazz, Schema
schema);
}
```
and then generate byte code based on a schema and class:
```java
public class GeneratedSchemaUserTypeCreator extends
SerializableFunction4<Object, Object, Object, Object, JavaBean> implements
UserTypeCreatorFactory {
private final FieldValueSetter[] setters;
Object apply(Object... args) { // for UserTypeCreatorFactory
return apply(args[0], args[1], args[2], args[3]);
}
Object apply(Object p0, Object p1, Object p2, Object p3) { // faster
// we don't use newInstance, instead just generate byte-code with a
constructor call
Object object = new JBean();
setters[0].set(object, p0);
setters[1].set(object, p1);
setters[2].set(object, p2);
setters[3].set(object, p3);
}
}
public class GeneratedFromRowUsingCreator extends
FromRowUsingCreator<JavaBean> {
private final SerializableFunction4 creator;
private final FromRowUsingCreator<InnerJavaBean> underlying1; // for
field_1
Generated(Schema schema) {
// know that it is SerializableFunction4 because there are 4 fields in
schema
creator = (SerializableFunction4)
schemaTypeCreatorFactory.create(JavaBean.class, schema);
}
public T apply(Row row) {
// calling .apply(Object p0, ..., Object p3) is much faster then
`.apply(Object... params)`
// due to JIT
return creator.apply(
row.getValue(0),
// byte code will contain a call to underling FromRowUsingCreator
// only in the case of ROW, MAP or ARRAY field
underlying.toRow(row.getRow(1)),
row.getValue(2),
row.getValue(3));
}
}
```
When I did benchmark, I used JMH, and didn't include any cost that we pay
once per pipeline construction.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 178318)
Time Spent: 2h 40m (was: 2.5h)
> Performance regression caused by extra calls to TypeDescriptor.getRawType
> -------------------------------------------------------------------------
>
> Key: BEAM-6276
> URL: https://issues.apache.org/jira/browse/BEAM-6276
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Reuven Lax
> Assignee: Reuven Lax
> Priority: Major
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)