+1 to this approach.
Eh, they'll all be AI's anyway and will just rewrite the code
in a background thread.
It's a possibility. Though I haven't coded and benchmarked
such an
approach and I don't think I would have the time before the
freeze to
take advantage of the sstable format change opportunity.
Still it's sthg that can be explored later. If we can shave
a few extra
% then that would always be great imo.
On 23/6/23 13:57, Benedict wrote:
> If we’re doing this, why don’t we delta encode a vint
from some per-sstable minimum value? I’d expect that to
commonly compress to a single byte or so.
>
>>
>> Distant future people will not be happy about
this, I can already tell you now.
>>
>> Sounds like a reasonable improvement to me
however.
>>
>>>
>>> Hi all,
>>>
>>> DeletionTime.markedForDeleteAt is a long
useconds since Unix Epoch. But I noticed that with 7 bytes we
can already encode ~2284 years. We can either shed the 8th
byte, for reduced IO and disk, or can encode some sentinel
values (such as LIVE) as flags there. That would mean reading
and writing 1 byte instead of 12 (8 mfda long + 4 ldts int).
Yes we already avoid serializing DeletionTime (DT) in sstables
at _row_ level entirely but not at _partition_ level and it is
also serialized at index, metadata, etc.
>>>
>>>
>>> [java] Benchmark (liveDTPcParam)
(sstableParam) Mode Cnt Score Error Units
>>> [java]
DeletionTimeDeSerBench.testRawAlgReads 70PcLive
NC avgt 15 0.331 ± 0.001 ns/op
>>> [java]
DeletionTimeDeSerBench.testRawAlgReads 70PcLive
OA avgt 15 0.335 ± 0.004 ns/op
>>> [java]
DeletionTimeDeSerBench.testRawAlgReads 30PcLive
NC avgt 15 0.334 ± 0.002 ns/op
>>> [java]
DeletionTimeDeSerBench.testRawAlgReads 30PcLive
OA avgt 15 0.340 ± 0.008 ns/op
>>> [java]
DeletionTimeDeSerBench.testNewAlgWrites 70PcLive
NC avgt 15 0.337 ± 0.006 ns/op
>>> [java]
DeletionTimeDeSerBench.testNewAlgWrites 70PcLive
OA avgt 15 0.340 ± 0.004 ns/op
>>> [java]
DeletionTimeDeSerBench.testNewAlgWrites 30PcLive
NC avgt 15 0.339 ± 0.004 ns/op
>>> [java]
DeletionTimeDeSerBench.testNewAlgWrites 30PcLive
OA avgt 15 0.343 ± 0.016 ns/op
>>>
>>> That was ByteBuffer backed to test the extra
bit level operations impact. But what would be the impact of
an end to end test against disk?
>>>
>>> [java] Benchmark (diskRAMParam)
(liveDTPcParam) (sstableParam) Mode Cnt Score Error
Units
>>> [java]
DeletionTimeDeSerBench.testE2EDeSerializeDT RAM
70PcLive NC avgt 15 605236.515 ± 19929.058
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2EDeSerializeDT RAM
70PcLive OA avgt 15 586477.039 ± 7384.632
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2EDeSerializeDT RAM
30PcLive NC avgt 15 937580.311 ± 30669.647
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2EDeSerializeDT RAM
30PcLive OA avgt 15 914097.770 ± 9865.070
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2EDeSerializeDT Disk
70PcLive NC avgt 15 1314417.207 ± 37879.012
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2EDeSerializeDT
Disk 70PcLive OA avgt 15 805256.345 ±
15471.587 ns/op
>>> [java]
DeletionTimeDeSerBench.testE2EDeSerializeDT
Disk 30PcLive NC avgt 15 1583239.011
± 50104.245 ns/op
>>> [java]
DeletionTimeDeSerBench.testE2EDeSerializeDT
Disk 30PcLive OA avgt 15 1439605.006
± 64342.510 ns/op
>>> [java]
DeletionTimeDeSerBench.testE2ESerializeDT RAM
70PcLive NC avgt 15 295711.217 ± 5432.507
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2ESerializeDT RAM
70PcLive OA avgt 15 305282.827 ± 1906.841
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2ESerializeDT RAM
30PcLive NC avgt 15 446029.899 ± 4038.938
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2ESerializeDT RAM
30PcLive OA avgt 15 479085.875 ± 10032.804
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2ESerializeDT Disk
70PcLive NC avgt 15 1789434.838 ± 206455.771
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2ESerializeDT
Disk 70PcLive OA avgt 15 589752.861 ±
31676.265 ns/op
>>> [java]
DeletionTimeDeSerBench.testE2ESerializeDT Disk
30PcLive NC avgt 15 1754862.122 ± 164903.051
ns/op
>>> [java]
DeletionTimeDeSerBench.testE2ESerializeDT Disk
30PcLive OA avgt 15 1252162.253 ± 121626.818
ns/o
>>>
>>> We can see big improvements when backed with
the disk and little impact from the new alg.
>>>
>>> Given we're already introducing a new sstable
format (OA) in 5.0 I would like to try to get this in before
the freeze. The point being that sstables with lots of small
partitions would benefit from a smaller DT at partition level.
My tests show a 3%-4% size reduction on disk.
>>>
>>> Before proceeding though I'd like to bounce
the idea against the community for all the corner cases and
scenarios I might have missed where this could be a problem?
>>>
>>> Thx in advance!
>>>
>>>
>>>
>>>