>From Ian Maxon <[email protected]>: Ian Maxon has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187 )
Change subject: [ASTERIXDB-2838][RT][FUN] Batched PyUDF calls ...................................................................... Patch Set 18: (16 comments) https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.0.ddl.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.0.ddl.sqlpp: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.0.ddl.sqlpp@29 PS16, Line 29: load dataset Tweet using localfs(("path"="asterix_nc1://data/twitter/real.adm"),("format"="adm")); > we usually put load statement into a separate, .update.sqlpp file. Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.10.ddl.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.10.ddl.sqlpp: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.10.ddl.sqlpp@23 PS16, Line 23: as "sentiment", "TweetSent.sentiment" at testlib with {"null-call": "true"}; > minor. "null-call" value could also be a boolean. […] actually i don't believe so, i will add it in the api patch https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.5.query.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.5.query.sqlpp: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.5.query.sqlpp@23 PS16, Line 23: limit 100; > need ORDER BY clause to establish stable order of the results. Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.6.query.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.6.query.sqlpp: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.6.query.sqlpp@25 PS16, Line 25: limit 100; > need ORDER BY clause. Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.7.update.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.7.update.sqlpp: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/mysentiment_twitter/mysentiment_twitter.7.update.sqlpp@29 PS16, Line 29: { "create_at": datetime("2015-11-23T16:14:03.000Z"), "id": 31, "in_reply_to_status": -1, "in_reply_to_user": -1, "favorite_count": 0, "retweet_count": 0, "lang": "und", "is_retweet": false, "user": { "id": 31, "name": "Asia✨02/03❤️‼️", "screen_name": "AsiaCarlton", "lang": "en", "location": "Houston, TX", "create_at": date("2013-02-17"), "description": "I'm A Mf'Kn Queen ❣✊ #freenick", "followers_count": 1083, "friends_count": 924, "statues_count": 6067 }, "place": { "country": "United States", "country_code": "United States", "full_name": "Houston, TX", "id": "1c69a67ad480e1b1", "name": "Houston", "place_type": "city", "bounding_box": rectangle("-95.823268,29.522325 -95.069705,30.154665") }, "geo_tag": { "stateID": 48, "stateName": "Texas", "countyID": 48201, "countyName": "Harris", "cityID": 4835000, "cityName": "Houston" } }, > We can't have this kind of language in our tests. […] this is the data actually- i just changed the IDs. to hide the shall we say colorful language here and in the results i altered this to be an insert, it's better that way anyway. and for previous tests that query the top 100, i made it do string length instead of field access. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java File asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java@141 PS16, Line 141: throw new HyracksDataException(e.getMessage()); > use HyrackDataException. […] Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java File asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@103 PS16, Line 103: argHolders = Arrays.stream(fnArgColumns).map(i -> ByteBuffer.wrap(new byte[ctx.getInitialFrameSize()])) > why do you not use ctx. […] didn't know about it, fixed https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@105 PS16, Line 105: outputWrapper = ByteBuffer.wrap(new byte[ctx.getInitialFrameSize()]); > why do you not use ctx. […] didn't know about it, fixed https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@113 PS16, Line 113: nullCalls[func] = new ATypeTag[numTuples]; > can we avoid array creation here (array per frame) and reuse/extend existing > array? yeah just need to not iterate on it which is fine https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@139 PS16, Line 139: try (MessageUnpacker errorUnpacker = MessagePack.newDefaultUnpacker(result.getFirst())) { > MessageUnpacker has reset() method. […] Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@167 PS16, Line 167: for (int colIdx = 0; colIdx < cols.length; colIdx++) { > If fnDescs[func].getFunctionInfo(). […] true think i even had it here at one point. fixed https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@191 PS16, Line 191: //TODO: maybe this could be done in parallel for each unique library evaluator? > we shouldn't start any threads here, but may be we could somehow introduce an > async API for callPyth […] hm yes. this would be a slightly scary refactor though because it kind of needs to go from a synchronous operation to more like fork/join. callPythonMulti() would need an await() for the number of calls or something to get recieved back on the IPCProto... https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@192 PS16, Line 192: List<Pair<ByteBuffer, Counter>> batchResults = new ArrayList<>(argHolders.size()); > can we avoid object creation here? yes idk why i did it like this https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@200 PS16, Line 200: batchResults.add(new Pair<>(columnResult, new Counter(numResults))); > avoid object creation? Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java@202 PS16, Line 202: batchResults.add(new Pair<>(null, new Counter(-1))); > avoid object creation? Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/ExternalFunctionInfo.java File asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/ExternalFunctionInfo.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187/16/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/ExternalFunctionInfo.java@33 PS16, Line 33: private static final long serialVersionUID = 4L; > increase serialVersionUID? oops, done -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10187 To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-Project: asterixdb Gerrit-Branch: cheshire-cat Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109173 Gerrit-Change-Number: 10187 Gerrit-PatchSet: 18 Gerrit-Owner: Ian Maxon <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Dmitry Lychagin <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Comment-Date: Wed, 10 Mar 2021 18:09:49 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No Comment-In-Reply-To: Dmitry Lychagin <[email protected]> Gerrit-MessageType: comment
