[
https://issues.apache.org/jira/browse/CALCITE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069217#comment-17069217
]
neoremind edited comment on CALCITE-3878 at 3/28/20, 7:59 AM:
--------------------------------------------------------------
I didn't expect to bring so deep discussion on this. Sorry to bring so much
trouble to you guys, we should more focus on Calcite itself.
[~julianhyde] , I agree that for high level language, code clarity matters a
lot. The example you mentioned:
{code:java}
@Override public RelNode convert(RelNode rel) {
LogicalJoin join = (LogicalJoin) rel;
List<RelNode> newInputs = new ArrayList<>(join.getInputs().size());
for (RelNode input : join.getInputs()) {
{code}
Indeed, the wasted space is not a big deal.Like [~vladimirsitnikov] pointed
out, in cases like with so many selected expressions, might be a good reason.
So the debate make me think what shall we do in the future as a principle?
Allow new ArrayList with capacity or not?
I search on existing code, there are already mixed usages. For example,
1) Specify constant capacity.
{code:java}
JdbcTable:
public SqlIdentifier tableName() {
final List<String> names = new ArrayList<>(3);
if (jdbcSchema.catalog != null) {
names.add(jdbcSchema.catalog);
}
if (jdbcSchema.schema != null) {
names.add(jdbcSchema.schema);
}
names.add(jdbcTableName);
return new SqlIdentifier(names, SqlParserPos.ZERO);
}
{code}
2) Specify 0 capacity.
{code:java}
RexProgram:
for (RelCollation collation : inputCollations) {
final List<RelFieldCollation> fieldCollations = new ArrayList<>(0);
for (RelFieldCollation fieldCollation : collation.getFieldCollations()) {
final int source = fieldCollation.getFieldIndex();
final int target = targets[source];
if (target < 0) {
continue loop;
}
fieldCollations.add(fieldCollation.withFieldIndex(target));
}
{code}
3) Specify capacity with exact size.
{code:java}
HepPlanner.java:
final List<RelNode> children = new ArrayList<>(childRels.size());
for (HepRelVertex childRel : childRels) {
children.add(childRel.getCurrentRel());
}
{code}
f we leave it as what happens now, not care too much, then I hope this issue
could be help people think of performance as well as clean code in the same
time. (Just row I notice people use
_description.matches("[A-Za-z][-A-Za-z0-9_.(),\\[\\]\\s:]*")_ in rule
constructor, which should be make the +Patter.complie+ out :) sorry for
nitpicking).
was (Author: neoremind):
I didn't expect to bring so deep discussion on this. Sorry to bring so much
trouble to you guys, we should more focus on Calcite itself.
[~julianhyde] , I agree that for high level language, code clarity matters a
lot. The example you mentioned:
{code:java}
@Override public RelNode convert(RelNode rel) {
LogicalJoin join = (LogicalJoin) rel;
List<RelNode> newInputs = new ArrayList<>(join.getInputs().size());
for (RelNode input : join.getInputs()) {
{code}
Indeed, the wasted space is not a big deal.Like [~vladimirsitnikov] pointed
out, in cases like with so many selected expressions, might be a good reason.
So the debate make me think what shall we do in the future as a principle?
Allow new ArrayList with capacity or not?
I search on existing code, there are already mixed usages. For example,
1) Specify constant capacity.
{code:java}
JdbcTable:
public SqlIdentifier tableName() {
final List<String> names = new ArrayList<>(3);
if (jdbcSchema.catalog != null) {
names.add(jdbcSchema.catalog);
}
if (jdbcSchema.schema != null) {
names.add(jdbcSchema.schema);
}
names.add(jdbcTableName);
return new SqlIdentifier(names, SqlParserPos.ZERO);
}
{code}
2) Specify 0 capacity.
{code:java}
RexProgram:
for (RelCollation collation : inputCollations) {
final List<RelFieldCollation> fieldCollations = new ArrayList<>(0);
for (RelFieldCollation fieldCollation : collation.getFieldCollations()) {
final int source = fieldCollation.getFieldIndex();
final int target = targets[source];
if (target < 0) {
continue loop;
}
fieldCollations.add(fieldCollation.withFieldIndex(target));
}
{code}
3) Specify capacity with exact size.
{code:java}
HepPlanner.java:
final List<RelNode> children = new ArrayList<>(childRels.size());
for (HepRelVertex childRel : childRels) {
children.add(childRel.getCurrentRel());
}
{code}
For lambda or functional way, this definitely can make code more readable, but
I am a little conservative for that in fundamental framework, there is no
silver bullet, this can be a tradeoff to performance, even small.
Small things accumulate and can grow big, in Java, especially commonly used in
big data distributed system, the bottleneck is usually IO and make us neglect
the trivial small things. Let's get backed on the topic, *what shall we do in
the future as a principle? Allow new ArrayList with capacity or not?* If we
leave it as what happens before, not care too much, then I hope this issue
could be help people think of Java api performance issue in the long-term.
(Just row I notice people use +if
(!description.matches("[A-Za-z][-A-Za-z0-9_.(),\\[
]
s:]*"))+ in rule constructor, which should be make the +Patter.complie+ out :)
sorry for nitpicking).
> Make ArrayList creation with initial capacity when size is fixed
> ----------------------------------------------------------------
>
> Key: CALCITE-3878
> URL: https://issues.apache.org/jira/browse/CALCITE-3878
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.22.0
> Reporter: neoremind
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> I find many places in Calcite where _new ArrayList<>()_ is used, if the list
> is expected to be immutable or not resizing, it is always a good manner to
> create with initial capacity, better for memory usage and performance.
> I search all occurrences, focus on the core module, to make it safe, I only
> update local variables with fixed size and not working in recursive method.
> If the local variable reference goes out of scope, if resizing is needed,
> things will work normally as well, so no side effect, but for the "escaping"
> case, I am very conservative and do not change them.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)