[
https://issues.apache.org/jira/browse/CASSANDRA-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045598#comment-18045598
]
Dmitry Konstantinov commented on CASSANDRA-21075:
-------------------------------------------------
CI test run: [https://pre-ci.cassandra.apache.org/job/cassandra/280/]
[^CASSANDRA-21075_ci_summary.htm]
[^CASSANDRA-21075_trunk_results_details.tar.xz]
The failures are typical and not related to the changed logic:
* [Tests / dtest-latest jdk11 15/64 /
dtest-latest.bootstrap_test.TestBootstrap.test_decommissioned_wiped_node_can_join|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/dtest-latest.bootstrap_test/TestBootstrap/Tests___dtest_latest_jdk11_15_64___test_decommissioned_wiped_node_can_join/]
* [Tests / dtest-latest jdk11 14/64 /
dtest-latest.bootstrap_test.TestBootstrap.test_killed_wiped_node_cannot_join|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/dtest-latest.bootstrap_test/TestBootstrap/Tests___dtest_latest_jdk11_14_64___test_killed_wiped_node_cannot_join/]
* [Tests / dtest-latest jdk11 13/64 /
dtest-latest.bootstrap_test.TestBootstrap.test_shutdown_wiped_node_cannot_join|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/dtest-latest.bootstrap_test/TestBootstrap/Tests___dtest_latest_jdk11_13_64___test_shutdown_wiped_node_cannot_join/]
* [Tests / jvm-dtest jdk11 13/16 /
org.apache.cassandra.distributed.test.accord.AccordWriteInteroperabilityTest.testTransactionStatementApplyIsInteropApply[transactionalMode=full,
migrated=firstPhase]-_jdk11_x86_64|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.distributed.test.accord/AccordWriteInteroperabilityTest/Tests___jvm_dtest_jdk11_13_16___testTransactionStatementApplyIsInteropApply_transactionalMode_full__migrated_firstPhase___jdk11_x86_64/]
* [Tests / jvm-dtest jdk11 5/16 /
org.apache.cassandra.distributed.test.accord.MigrationFromAccordWriteRaceTest.testSplitAndRetryNonSerialLoggedBatchTwoTablesTwoPkeyHintedViaBatchLogRoutingFailure-_jdk11_x86_64|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.distributed.test.accord/MigrationFromAccordWriteRaceTest/Tests___jvm_dtest_jdk11_5_16___testSplitAndRetryNonSerialLoggedBatchTwoTablesTwoPkeyHintedViaBatchLogRoutingFailure__jdk11_x86_64/]
* [Tests / jvm-dtest jdk11 15/16 /
org.apache.cassandra.distributed.test.cql3.PaxosV2MultiNodeTableWalkTest.test-_jdk11_x86_64|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.distributed.test.cql3/PaxosV2MultiNodeTableWalkTest/Tests___jvm_dtest_jdk11_15_16___test__jdk11_x86_64/]
* [Tests / jvm-dtest jdk11 2/16 /
org.apache.cassandra.fuzz.topology.AccordTopologyMixupTest.test-_jdk11_x86_64|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.fuzz.topology/AccordTopologyMixupTest/Tests___jvm_dtest_jdk11_2_16___test__jdk11_x86_64/]
* [Tests / simulator-dtest jdk11 /
org.apache.cassandra.simulator.test.HarrySimulatorTest.test-_jdk11_x86_64|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.simulator.test/HarrySimulatorTest/Tests___simulator_dtest_jdk11___test__jdk11_x86_64/]
* [Tests / test jdk11 14/20 /
org.apache.cassandra.utils.SimpleBitSetSerializersTest.any-_jdk11_x86_64|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.utils/SimpleBitSetSerializersTest/Tests___test_jdk11_14_20___any__jdk11_x86_64/]
* [Tests / dtest jdk11 14/64 /
dtest.bootstrap_test.TestBootstrap.test_killed_wiped_node_cannot_join|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/dtest.bootstrap_test/TestBootstrap/Tests___dtest_jdk11_14_64___test_killed_wiped_node_cannot_join/]
* [Tests / simulator-dtest jdk11 /
org.apache.cassandra.simulator.test.ShortAccordSimulationTest.simulationTest-_jdk11_x86_64|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.simulator.test/ShortAccordSimulationTest/Tests___simulator_dtest_jdk11___simulationTest__jdk11_x86_64/]
* [Tests / jvm-dtest jdk11 5/16 /
junit.framework.TestSuite.org.apache.cassandra.distributed.test.accord.MigrationFromAccordWriteRaceTest-_jdk11_x86_64|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/junit.framework/TestSuite/Tests___jvm_dtest_jdk11_5_16___org_apache_cassandra_distributed_test_accord_MigrationFromAccordWriteRaceTest__jdk11_x86_64/]
* [Tests / jvm-dtest jdk11 16/16 /
org.apache.cassandra.fuzz.topology.AccordBootstrapTest.terminated
successfully-cassandra.testtag_IS_UNDEFINED|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.fuzz.topology/AccordBootstrapTest/Tests___jvm_dtest_jdk11_16_16___terminated_successfully_cassandra_testtag_IS_UNDEFINED/]
* [Tests / simulator-dtest jdk11 /
org.apache.cassandra.simulator.test.ShortPaxosSimulationTest.selfReconcileTest-cassandra.testtag_IS_UNDEFINED|https://pre-ci.cassandra.apache.org/job/cassandra/280/testReport/junit/org.apache.cassandra.simulator.test/ShortPaxosSimulationTest/Tests___simulator_dtest_jdk11___selfReconcileTest_cassandra_testtag_IS_UNDEFINED/]
> Optimize UTF8Validator.validate for ASCII prefixed Strings
> ----------------------------------------------------------
>
> Key: CASSANDRA-21075
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21075
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: CQL/Interpreter
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 5.x
>
> Attachments: CASSANDRA-21075_ci_summary.htm,
> CASSANDRA-21075_trunk_results_details.tar.xz, batch_profile.yaml,
> before_cpu.html, utf8_after_cpu.html
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> During a write we validate every string received from a client, String (text)
> type is very popular and frequently while we declare type as text many values
> are actually ASCII or ASCII-prefixed. For example if we have a table
> partition key + clustering key + 5 value columns it means 7 validations per
> row, in case of 10 rows batch -> 70 validations. It is not very rare to have
> more complicated table structure with UDTs/collections, in this case the
> number of string values to validate can be quite high. So, even a small
> improvement here can be beneficial.
> In my batch write test, UTF8 validation contributes 2.1% of CPU:
> [^before_cpu.html]
> In UTF8Validator.validate we can apply the same optimization as Guava and JDK
> does: they use a plain loop to check if it is ASCII symbol before going into
> more complicated UTF8 parsing:
> *
> [https://github.com/google/guava/blob/master/guava/src/com/google/common/base/Utf8.java#L123]
> {code:java}
> for (int i = off; i < end; i++) {
> if (bytes[i] < 0) {
> return isWellFormedSlowPath(bytes, i, end);
> }
> } {code}
> * java.lang.StringCoding#decodeUTF8
> {code:java}
> // ascii-bais, which has a relative impact to the non-ascii-only bytes
> if (COMPACT_STRINGS && !hasNegatives(src, sp, len))
> return resultCached().with(Arrays.copyOfRange(src, sp, sp + len),
> LATIN1);
> return decodeUTF8_0(src, sp, len, doReplace);
> where:
> public static boolean hasNegatives(byte[] ba, int off, int len) {
> for (int i = off; i < off + len; i++) {
> if (ba[i] < 0) {
> return true;
> }
> }
> return false;
> } {code}
> See also:
> [https://lemire.me/blog/2018/10/16/validating-utf-8-bytes-java-edition/]
> Additionally, using of ValueAccessor is not a free lunch and by avoiding it
> we can get extra boost, especially in non-monomorphic cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]