Thanks Kenn for moving this forward. Though, what still buzzes me is - do we have a consensus about what we actually do with different type of annotations? Can we say, for example, that “@Experimental(Experimental.Kind.SOURCE_SINK)” is useless and we can get rid of it easily? Either, since Schema API is still under development than “@Experimental(Kind.SCHEMAS)” is required everywhere where Schema is used in public API? And so on… Does it make sense to split down this list by types of experimental annotation and the final decision for every type will be dependent on this?
> On 9 Mar 2020, at 04:39, Kenneth Knowles <[email protected]> wrote: > > On Sun, Mar 8, 2020 at 1:55 PM Ismaël Mejía <[email protected] > <mailto:[email protected]>> wrote: > Kenn can you adjust the script to match only source code files ... otherwise > it produces a lot of extra false positives > > I think the sheet only had false matches in build/ directories. Removed. Can > you comment on other cells that look like a new class of false positives? > > Also can we extract the full annotation as a column so we can filter/group > for the full kind (type) of the experimental annotation e.g. > @Experimental(Kind.SCHEMAS), > > This was already done. It is column D. It maybe is off the side of the screen > for you? > > we agreed with Luke Cwik was to remove the Experimental annotations from > ‘runners/core*’ > > Make sense; this was never end-user facing. > > It is probably worth to re run the script against the latest master because > results in the spreadsheet do not correspond with the current master. > > Hmmm. I just checked and the directory I ran it in is has detached > github/master checked out. So it might be a little stale, but not much. Since > people started to sign up it is a shame to reset the sheet. Probably the > files are still worth looking at, even if the line numbers don't match, and > if it was already processed that is an easy case. > > We also introduced package level Experimental annotations (package-info.java) > so > this can easily count for 50 duplicates that should probably be trimmed for > the > same person who is covering the corresponding files in the package. With all > these adjustments we will be easily below 250 matches. > > I agree that it is efficient, but I worry that package level experimental is > basically invisible to users. Since I sorted by filename it should be easy to > write your name once and then drag it to a whole set of files? Really we > mostly only care about "what file, and which KIND annotations are present". I > just made a new tab with that info, but it did not gather all the different > annotations that may be in the file. > > Kenn > > > Regards, > Ismaël > > [1] > https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E > > <https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E> > > > > On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles <[email protected] > <mailto:[email protected]>> wrote: > > > > OK I tried to make a tiny bit of progress on this, with `grep --ignore-case > > --line-number --recursive '@experimental' .` there are 578 occurrences > > (includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc > > -l` there are 377 distinct code files. > > > > So that's a big project but easily scales to the contributors. I suggest we > > need to crowdsource a bit. > > > > I created > > https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing > > > > <https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing> > > where you can suggest/comment adding your name to a file to volunteer to > > own going through the file. > > > > I have not checked git history to try to find owners. > > > > Kenn > > > > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <[email protected] > > <mailto:[email protected]>> wrote: > >> > >> Thank you Kenn for starting this discussion. > >> > >> As I see, for now, the main goal for “@Experimental" annotation is to > >> relive and be useful in the sense as it’s name says (this is obviously not > >> a case for the moment). I'd suggest a bit more simplified scenario for > >> this: > >> > >> 1. We do a revision of all “@Experimental" annotation uses now. For the > >> code (IOs/libs/etc) that we 100% know that has been used in production for > >> a long time with current stable API, we just take this annotation away > >> since it’s no needed anymore. > >> > >> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait > >> for N releases (N=3 ?) and then take it away if there are no breaking > >> changes happened. We may want to add new argument for “@Experimental” to > >> keep track release number when it was added. > >> > >> 3. We would need to have a regular “Experimental annotation report” (like > >> we have for dependencies) sending to dev@ and it will allow us to track > >> new and out-dated annotation. > >> > >> 4. And on course we update contributors documentation about that. > >> > >> Idea of graduation by voting seems a bit complicated - for me it means > >> that all added new user APIs should go through this process and I’m afraid > >> that in the end, we potentially can be overwhelmed with number of such > >> polls. I think that several releases of maturation and final decision of > >> the person(2) responsible for the component should be enough. > >> > >> In the same time, I like the Andrew’s idea about checking a breaking > >> changes through external tool. So, it could guarantee us to to remove > >> experimental state without any fear to break API. > >> > >> In case of breaking changes of stable API, that won’t be possible to > >> avoid, we still can use @Deprecated and wait for 3 release to remove (as > >> we already did before). So, having up-to-date @Experimental and > >> @Deprecated annotations won’t be confusing for users. > >> > >> > >> > >> > >> > >> On 28 Nov 2019, at 04:48, Kenneth Knowles <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> > >> > >> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold <[email protected] > >> <mailto:[email protected]>> wrote: > >>> > >>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles <[email protected] > >>> <mailto:[email protected]>> wrote: > >>> > > >>> > >>> > *Opt-in*: This is a powerful idea that I think changes everything. > >>> > - for an experimental new IO, a separate artifact; this way we can > >>> > also see downloads > >>> > - for experimental code fragments, add checkState that the relevant > >>> > experiment is turned on via flags > >>> > >>> To be clear the experimental artifact would have the same group ID and > >>> artifact ID but a different version than the non-experimental > >>> artifacts? E.g. > >>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental > >>> > >>> That could work. Changing the artifact ID or the package name would > >>> risk split package issues and diamond dependency problems. We'd still > >>> need to be careful about mixing experimental and non-experimental > >>> artifacts. > >> > >> > >> That's clever! I think using the classifier might be better than a > >> modified version number, e.g. > >> org.apache.beam:beam-io-mydb:2.4.0:experimental > >> > >> My prior idea was much less clever: for any version 2.X there would either > >> be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no > >> problem with a split package. There would be no "same artifact id" concern. > >> > >> Your idea would allow us to ship two variants of the library, if we > >> developed the tooling for it. I think doing the stripping of experimental > >> bits and ensuring they both compile might be tricky unless we are > >> stripping rather disjoint piece of the library. > >> > >> Kenn > >> > >>
