Dmitry Lychagin has posted comments on this change. Change subject: WIP: stddev ......................................................................
Patch Set 4: (18 comments) Looks good. My comments are relatively minor. https://asterix-gerrit.ics.uci.edu/#/c/2990/4//COMMIT_MSG Commit Message: Line 7: WIP: stddev Follow the commit message template. So it should be something like: [ASTERIXDB-2459][FUN] Add sttdev() aggregate function - user model changes: yes - storage format changes: no - interface changes: no Details: copy your details here https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate-sql/serial_stddev_double_null/serial_stddev_double_null.2.update.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate-sql/serial_stddev_double_null/serial_stddev_double_null.2.update.sqlpp: Line 25: select element {'id':1,'gid':1,'val':double(5.32)}; It looks like the intent was to test cases when valplus is null, but in this line the 'valplus' value is 'missing'. Let's add another record with 'valplus': null, so we test both cases of null and missing. This also applies to other testcases (*_null), we should add 'valplus': null to all of them https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/agg_null_rec/agg_null_rec.3.query.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/agg_null_rec/agg_null_rec.3.query.sqlpp: Line 43: )), 'stddev':test.coll_stddev(( should be "strict_stddev". "coll_" prefix still works but is deprecated https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/agg_null_rec_1/agg_null_rec_1.3.query.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/agg_null_rec_1/agg_null_rec_1.3.query.sqlpp: Line 56: select element i.val should it be i.val (like in avg/sum above) or i.valplus (like in min/max above)? Could you double check? https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/agg_number_rec/agg_number_rec.3.query.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/agg_number_rec/agg_number_rec.3.query.sqlpp: Line 43: )),'stddev':test.coll_stddev(( should be "strict_stddev" https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/stddev_float_null/stddev_float_nu.1.ddl.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/stddev_float_null/stddev_float_nu.1.ddl.sqlpp: Line 1: /* type in the name of this file. should be '_null', not '_nu' https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/stddev_float_null/stddev_float_nu.3.query.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/aggregate/stddev_float_null/stddev_float_nu.3.query.sqlpp: Line 1: /* type in the name of this file. should be '_null', not '_nu' https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-app/src/test/resources/runtimets/results/aggregate-sql/serial_stddev_float/serial_stddev_float.1.adm File asterixdb/asterix-app/src/test/resources/runtimets/results/aggregate-sql/serial_stddev_float/serial_stddev_float.1.adm: Line 1: { "gid": 1, "stddev": 00.9574271077563381 } why two 0s before the decimal point? https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-app/src/test/resources/runtimets/results/aggregate-sql/sum_int8_null/sum_int8_null.1.adm File asterixdb/asterix-app/src/test/resources/runtimets/results/aggregate-sql/sum_int8_null/sum_int8_null.1.adm: Line 1: -112 this seems to be wrong. please file an issue for this and meanwhile change the input data to avoid overflow https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/BuiltinFunctions.java File asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/BuiltinFunctions.java: Line 1406: addPrivateFunction(SERIAL_LOCAL_SQL_STDDEV, LocalAvgTypeComputer.INSTANCE, true); should be LocalSingleVarStatisticsTypeComputer.INSTANCE Line 1407: addPrivateFunction(SERIAL_INTERMEDIATE_SQL_STDDEV, LocalAvgTypeComputer.INSTANCE, true); should be LocalSingleVarStatisticsTypeComputer.INSTANCE https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/serializable/std/AbstractSerializableSingleVariableStatisticsAggregateFunction.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/serializable/std/AbstractSerializableSingleVariableStatisticsAggregateFunction.java: Line 173: moments m = new moments(m1, m2, count); let's not create a new object for each tuple. We need to figure out a way to avoid it. One possible solution is to maintain each of 3 values in its own AMutableDouble/Int64 and have a method that'd update them for each tuple. Perhaps you could reuse aDouble, aInt64 instance you already have, just need to create another one for the 2nd double. It'd be nice if we could share the updateStepMoments() and combineMoments() with AbstractSingleVarStatisticsAggregateFunction instead of duplicating it there. Creating a helper class could be ok as long as we do not create its new instance for every tuple we process. (so once in init() is fine). Line 194: case TINYINT: { > MAJOR SonarQube violation: Remove { } from case blocks to fix this SonarQube complaint Line 279: moments m = new moments(m1, m2, count); same comment. avoid object creation Line 303: int offset5 = ARecordSerializerDeserializer.getFieldOffsetById(serBytes, offset, COUNT_FIELD_ID, should be "offset3" https://asterix-gerrit.ics.uci.edu/#/c/2990/4/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/AbstractSingleVarStatisticsAggregateFunction.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/AbstractSingleVarStatisticsAggregateFunction.java: Line 96: new IAType[] { BuiltinType.ADOUBLE, BuiltinType.ADOUBLE, BuiltinType.AINT64 }, false); LocalSingleVarStatisticsTypeComputer says that "count"'s type in AINT32, but this runtime code produces AINT64. We need to fix LocalSingleVarStatisticsTypeComputer to return AINT64 for the "count" field. It looks like similar problem exists in LocalAvgTypeComputer. It needs to be fixed there too. Can you file a bug and fix that in a separate change after this one? Line 121: double delta = val - m1; Need to figure out how to reuse this code together with AbstractSerializableSingleVariableStatisticsAggregateFunction instead of having a duplicate code here. PS4, Line 258: offset5 should be "offset3", not "5"? -- To view, visit https://asterix-gerrit.ics.uci.edu/2990 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia709669a9d20358f11ad28f453ae8ad8551f6334 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: James Fang <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Dmitry Lychagin <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-HasComments: Yes
