[ 
https://issues.apache.org/jira/browse/CALCITE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069217#comment-17069217
 ] 

neoremind edited comment on CALCITE-3878 at 3/28/20, 8:00 AM:
--------------------------------------------------------------

I didn't expect to bring so deep discussion on this. Sorry to bring so much 
trouble to you guys, we should more focus on Calcite itself.

[~julianhyde] , I agree that for high level language, code clarity matters a 
lot. The example you mentioned:
{code:java}
@Override public RelNode convert(RelNode rel) {
    LogicalJoin join = (LogicalJoin) rel;
    List<RelNode> newInputs = new ArrayList<>(join.getInputs().size());
    for (RelNode input : join.getInputs()) {
{code}
Indeed, the wasted space is not a big deal.Like [~vladimirsitnikov] pointed 
out, in cases like with so many selected expressions, might be a good reason.

So the debate make me think what shall we do in the future as a principle? 
Allow new ArrayList with capacity or not?

I search on existing code, there are already mixed usages. For example,

1) Specify constant capacity.
{code:java}
JdbcTable:
  public SqlIdentifier tableName() {
    final List<String> names = new ArrayList<>(3);
    if (jdbcSchema.catalog != null) {
      names.add(jdbcSchema.catalog);
    }
    if (jdbcSchema.schema != null) {
      names.add(jdbcSchema.schema);
    }
    names.add(jdbcTableName);
    return new SqlIdentifier(names, SqlParserPos.ZERO);
  }
{code}
2) Specify 0 capacity.
{code:java}
RexProgram:
  for (RelCollation collation : inputCollations) {
      final List<RelFieldCollation> fieldCollations = new ArrayList<>(0);
      for (RelFieldCollation fieldCollation : collation.getFieldCollations()) {
        final int source = fieldCollation.getFieldIndex();
        final int target = targets[source];
        if (target < 0) {
          continue loop;
        }
        fieldCollations.add(fieldCollation.withFieldIndex(target));
      }
{code}
3) Specify capacity with exact size.
{code:java}
HepPlanner.java:
  final List<RelNode> children = new ArrayList<>(childRels.size());
  for (HepRelVertex childRel : childRels) {
    children.add(childRel.getCurrentRel());
  }
{code}
If we leave it as what happens now, not care too much, then I hope this issue 
could be help people think of performance as well as clean code in the same 
time. (Just row I notice people use 
_description.matches("[A-Za-z][-A-Za-z0-9_.(),\\[\\]
s:]*")_ in rule constructor, which should be make the +Patter.complie+ out :) 
sorry for nitpicking).

 


was (Author: neoremind):
I didn't expect to bring so deep discussion on this. Sorry to bring so much 
trouble to you guys, we should more focus on Calcite itself.

[~julianhyde] , I agree that for high level language, code clarity matters a 
lot. The example you mentioned:
{code:java}
@Override public RelNode convert(RelNode rel) {
    LogicalJoin join = (LogicalJoin) rel;
    List<RelNode> newInputs = new ArrayList<>(join.getInputs().size());
    for (RelNode input : join.getInputs()) {
{code}
Indeed, the wasted space is not a big deal.Like [~vladimirsitnikov] pointed 
out, in cases like with so many selected expressions, might be a good reason.

So the debate make me think what shall we do in the future as a principle? 
Allow new ArrayList with capacity or not?

I search on existing code, there are already mixed usages. For example,

1) Specify constant capacity.
{code:java}
JdbcTable:
  public SqlIdentifier tableName() {
    final List<String> names = new ArrayList<>(3);
    if (jdbcSchema.catalog != null) {
      names.add(jdbcSchema.catalog);
    }
    if (jdbcSchema.schema != null) {
      names.add(jdbcSchema.schema);
    }
    names.add(jdbcTableName);
    return new SqlIdentifier(names, SqlParserPos.ZERO);
  }
{code}
2) Specify 0 capacity.
{code:java}
RexProgram:
  for (RelCollation collation : inputCollations) {
      final List<RelFieldCollation> fieldCollations = new ArrayList<>(0);
      for (RelFieldCollation fieldCollation : collation.getFieldCollations()) {
        final int source = fieldCollation.getFieldIndex();
        final int target = targets[source];
        if (target < 0) {
          continue loop;
        }
        fieldCollations.add(fieldCollation.withFieldIndex(target));
      }
{code}
3) Specify capacity with exact size.
{code:java}
HepPlanner.java:
  final List<RelNode> children = new ArrayList<>(childRels.size());
  for (HepRelVertex childRel : childRels) {
    children.add(childRel.getCurrentRel());
  }
{code}
f we leave it as what happens now, not care too much, then I hope this issue 
could be help people think of performance as well as clean code in the same 
time. (Just row I notice people use 
_description.matches("[A-Za-z][-A-Za-z0-9_.(),\\[\\]\\s:]*")_ in rule 
constructor, which should be make the +Patter.complie+ out :) sorry for 
nitpicking).

 

> Make ArrayList creation with initial capacity when size is fixed
> ----------------------------------------------------------------
>
>                 Key: CALCITE-3878
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3878
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.22.0
>            Reporter: neoremind
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I find many places in Calcite where _new ArrayList<>()_ is used, if the list 
> is expected to be immutable or not resizing, it is always a good manner to 
> create with initial capacity, better for memory usage and performance.
> I search all occurrences, focus on the core module, to make it safe, I only 
> update local variables with fixed size and not working in recursive method. 
> If the local variable reference goes out of scope, if resizing is needed, 
> things will work normally as well, so no side effect, but for the "escaping" 
> case, I am very conservative and do not change them.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to