Inline

> On Nov 1, 2021, at 9:23 AM, Branimir Lambov <blam...@apache.org> wrote:
> 
> As Jacek is not a committer, this proposal needs a shepherd. I would be
> happy to take this role.
> 
>> to me the interfaces has to be at the SSTable level, which then expose
> readers/writers, but also has to expose the other things we do outside of
> those paths
> 
> Could you give some detail on what these things are? Are they something
> different from what the standalone Cassandra tools (scrub/verify/upgrade)
> are currently doing? Obviously, any pluggability proposal will have to
> provide a solution to these, and it would be helpful to know what needs to
> be done beyond making sure the bundled tools work correctly (which includes
> iterating indexes; format-specific operations (e.g. index summary
> redistribution) are excluded as they are to be handled by the individual
> format).

Looking closer at compaction and repair I had forgotten that they were changed 
in CASSANDRA-15861 to go through the reader interface rather than directly 
mutate the files (concurrency bug).  I was thinking the logic which is now 
org.apache.cassandra.io.sstable.format.SSTableReader#mutateLevelAndReload and 
org.apache.cassandra.io.sstable.format.SSTableReader#mutateRepairedAndReload; 
so I believe compaction/repair may be ok with reader/writer; ignore those 
examples.

Checking usage of descriptor you find examples like

org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader#read - 
which calls: 
writer.descriptor.getMetadataSerializer().mutate(writer.descriptor, 
description, transform);
org.apache.cassandra.tools.Util#metadataFromSSTable - which is used by 
sstablemetadata tool
org.apache.cassandra.io.sstable.KeyIterator#KeyIterator - directly loads 
primary index from descriptor: new In(new 
File(desc.filenameFor(Component.PRIMARY_INDEX)));

Non of the examples I see couldn’t be rewritten to use read/writer; so relying 
on reader/writer as the main interfaces would work.

> 
> There is another problem in the current code alluded to in the question, in
> the fact that "SSTableReader" (tied to the sstable format and ready for
> querying data (i.e. with open data files and bloom filters loaded in
> memory)) is the only concept that the code uses to work with sstables. As I
> understand it, this proposal does not aim to solve that problem, only to
> make sure that we can properly read and write sstables of a given format,
> including in streaming and standalone tools. In other words, to provide the
> machinery to convert sstable descriptors into sstable readers and writers.
> 
> I see this as an expansion of CASSANDRA-7443 and cleanup of any changes
> that came after it and broke the intended capability.
> 
> Regards,
> Branimir
> 
> On Thu, Oct 28, 2021 at 7:43 PM David Capwell <dcapw...@apple.com.invalid>
> wrote:
> 
>> Sorry about that; used -1/+1 to show preference, not binding action
>> 
>>> On Oct 28, 2021, at 5:50 AM, bened...@apache.org wrote:
>>> 
>>>> I am -1 here, for the reasons listed above; the problem (in my eye) is
>> not reader/writer but higher level at the actual SSTable.  If we plug out
>> read/write but still allow direct file access, then these abstractions fail
>> to provide the goals of the CEP.
>>> 
>>> Be careful dropping -1s, as your -1s here are binding. I realise this
>> isn’t a vote thread, but the effect is the same. IMO we should try to
>> express our preferences and defer to the collective opinion where possible.
>> True -1s should very rarely appear.
>>> 
>>> 
>>> From: David Capwell <dcapw...@apple.com.INVALID>
>>> Date: Wednesday, 27 October 2021 at 15:33
>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>> Reading the CEP I don’t see any mention to the systems which access
>> SSTables; such as streaming (small callout to zero-copy-streaming with
>> ZeroCopyBigTableWriter) and repair.  If you are abstracting out
>> BigTableReader then you are not dealing with the implementation assumptions
>> that users of SSTables have (such as direct mutation to auxiliary files
>> outside of -Data.db).
>>> 
>>>> Audience
>>>>      • Cassandra developers who wish to see SSTableReader and
>> SSTableWriter more modular than they are today,
>>> 
>>> This statement relates to the above comment, many parts of the code do
>> not use Reader/Writer but instead use direct format knowledge to apply
>> changes to the file format (normally outside of -Data.db); to me the
>> interfaces has to be at the SSTable level, which then expose
>> readers/writers, but also has to expose the other things we do outside of
>> those paths.
>>> 
>>>>      • move the metrics related to sstable format out from
>> TableMetrics class and make them tied to certain sstable implementation
>>> 
>>> I am curious about this comment, are you removing exposing this
>> information?
>>> 
>>>>      • have a single factory for creating both readers and writers for
>> particular implementation of sstable and use it consistently - no direct
>> creation of any reader / writer
>>> 
>>> I am -1 here, for the reasons listed above; the problem (in my eye) is
>> not reader/writer but higher level at the actual SSTable.  If we plug out
>> read/write but still allow direct file access, then these abstractions fail
>> to provide the goals of the CEP.
>>> 
>>> I am +1 to the intent of the CEP.
>>> 
>>> And last comment, which I have also done in the other modularity thread…
>> backwards compatibility and maintenance. It is not clear right now what
>> java interfaces may not break and how we can maintain and extend such
>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>> and offer new SSTable formats, are we as a project ok with having a minor
>> release do a binary or source non-compatible change?  If not how do we
>> detect this?  Until this problem is solved, I do not think we should add
>> any such interfaces.
>>> 
>>>> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <jeremiah.jor...@gmail.com>
>> wrote:
>>>> 
>>>> Hi Stefan,
>>>> That idea is not related to this CEP which is about the file formats of
>> the
>>>> sstables, not file system access.  But you should take a look at the
>> work
>>>> recently committed in
>> https://issues.apache.org/jira/browse/CASSANDRA-16926
>>>> to switch to using java.nio.file.Path for file access.  This should
>> allow
>>>> the use of a file system provider to access files which could be the
>> basis
>>>> for work to load the files from S3.
>>>> 
>>>> -Jeremiah
>>>> 
>>>> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
>>>> stefan.mikloso...@instaclustr.com> wrote:
>>>> 
>>>>> One point I would like to add to this; I was already looking into how
>>>>> to extend this but what I saw in SSTableReader was that it is very
>>>>> much "file system oriented". There was not any possibility to actually
>>>>> hook something like that there. I think what importing does is that it
>>>>> will use SSTableReader / Writer stuff so I think that the modification
>>>>> of these classes to accommodate this idea would be necessary.
>>>>> 
>>>>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>>>>> <stefan.mikloso...@instaclustr.com> wrote:
>>>>>> 
>>>>>> Hi Jacek,
>>>>>> 
>>>>>> Thanks for taking the lead on this.
>>>>>> 
>>>>>> There was importing of SSTables introduced in 4.0 via
>>>>>> StorageService#importNewSSTables. The "problem" with this is that
>>>>>> SSTables need to be physically located at disk so Cassandra can read
>>>>>> them. If a backup is taken and SSTables are uploaded to, for example,
>>>>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
>>>>>> first and then imported. What about downloading them / importing them
>>>>>> directly from S3? Or any custom source for that matter? Importing of
>>>>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
>>>>>> link / refresh, it is all handled internally.
>>>>>> 
>>>>>> I am not sure if your work is related to this idea but I would
>>>>>> appreciate it if this is pluggable as well for the sake of simplicity
>>>>>> and effectiveness as we would not have to download all sstables before
>>>>>> importing them.
>>>>>> 
>>>>>> If it is not related, feel free to skip that completely and I guess I
>>>>>> would have to try to push that forward myself.
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> 
>>>>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
>>>>>> <lewandowski.ja...@gmail.com> wrote:
>>>>>>> 
>>>>>>> I'd like to start a discussion about SSTable format API proposal
>>>>> (CEP-17)
>>>>>>> 
>>>>>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
>>>>>>> CEP:
>>>>> 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Jacek
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to