> I am -1 here, for the reasons listed above; the problem (in my eye) is not > reader/writer but higher level at the actual SSTable. If we plug out > read/write but still allow direct file access, then these abstractions fail > to provide the goals of the CEP.
Be careful dropping -1s, as your -1s here are binding. I realise this isn’t a vote thread, but the effect is the same. IMO we should try to express our preferences and defer to the collective opinion where possible. True -1s should very rarely appear. From: David Capwell <dcapw...@apple.com.INVALID> Date: Wednesday, 27 October 2021 at 15:33 To: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056) Reading the CEP I don’t see any mention to the systems which access SSTables; such as streaming (small callout to zero-copy-streaming with ZeroCopyBigTableWriter) and repair. If you are abstracting out BigTableReader then you are not dealing with the implementation assumptions that users of SSTables have (such as direct mutation to auxiliary files outside of -Data.db). > Audience > • Cassandra developers who wish to see SSTableReader and SSTableWriter > more modular than they are today, This statement relates to the above comment, many parts of the code do not use Reader/Writer but instead use direct format knowledge to apply changes to the file format (normally outside of -Data.db); to me the interfaces has to be at the SSTable level, which then expose readers/writers, but also has to expose the other things we do outside of those paths. > • move the metrics related to sstable format out from TableMetrics > class and make them tied to certain sstable implementation I am curious about this comment, are you removing exposing this information? > • have a single factory for creating both readers and writers for > particular implementation of sstable and use it consistently - no direct > creation of any reader / writer I am -1 here, for the reasons listed above; the problem (in my eye) is not reader/writer but higher level at the actual SSTable. If we plug out read/write but still allow direct file access, then these abstractions fail to provide the goals of the CEP. I am +1 to the intent of the CEP. And last comment, which I have also done in the other modularity thread… backwards compatibility and maintenance. It is not clear right now what java interfaces may not break and how we can maintain and extend such interfaces in the future. If the goal is to allow 3rd parties to plugin and offer new SSTable formats, are we as a project ok with having a minor release do a binary or source non-compatible change? If not how do we detect this? Until this problem is solved, I do not think we should add any such interfaces. > On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <jeremiah.jor...@gmail.com> > wrote: > > Hi Stefan, > That idea is not related to this CEP which is about the file formats of the > sstables, not file system access. But you should take a look at the work > recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926 > to switch to using java.nio.file.Path for file access. This should allow > the use of a file system provider to access files which could be the basis > for work to load the files from S3. > > -Jeremiah > > On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic < > stefan.mikloso...@instaclustr.com> wrote: > >> One point I would like to add to this; I was already looking into how >> to extend this but what I saw in SSTableReader was that it is very >> much "file system oriented". There was not any possibility to actually >> hook something like that there. I think what importing does is that it >> will use SSTableReader / Writer stuff so I think that the modification >> of these classes to accommodate this idea would be necessary. >> >> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic >> <stefan.mikloso...@instaclustr.com> wrote: >>> >>> Hi Jacek, >>> >>> Thanks for taking the lead on this. >>> >>> There was importing of SSTables introduced in 4.0 via >>> StorageService#importNewSSTables. The "problem" with this is that >>> SSTables need to be physically located at disk so Cassandra can read >>> them. If a backup is taken and SSTables are uploaded to, for example, >>> S3 bucket, then upon restore, all these SSTables need to be downloaded >>> first and then imported. What about downloading them / importing them >>> directly from S3? Or any custom source for that matter? Importing of >>> SSTables is a very nice feature in 4.0, we do not need to copy / hard >>> link / refresh, it is all handled internally. >>> >>> I am not sure if your work is related to this idea but I would >>> appreciate it if this is pluggable as well for the sake of simplicity >>> and effectiveness as we would not have to download all sstables before >>> importing them. >>> >>> If it is not related, feel free to skip that completely and I guess I >>> would have to try to push that forward myself. >>> >>> Regards >>> >>> >>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski >>> <lewandowski.ja...@gmail.com> wrote: >>>> >>>> I'd like to start a discussion about SSTable format API proposal >> (CEP-17) >>>> >>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056 >>>> CEP: >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API >>>> >>>> Thanks, >>>> Jacek >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org