Great discussion. Glad to see it happening and lucky to have seen it on the mailing list due to its high volume.
I had this same conversation with Patrick Wendell few Spark Summits ago. At the time, SO was not even listed as a resource and the idea was to make it the primary "go-to" place for questions. Having contributed to both the list (in its early days) and SO, the biggest hurdle IMO is how to deal with lazy people. These days, at SO, I spend more time leaving comments than answering in an attempt to moderate the requirement of "show some effort" and clarify unclear questions. It's my impression that the mailing list is much more friendly with "plz send me da code" folk and indeed would answer questions that would otherwise get down-voted or closed at SO. That also shows in the high email volume, which at the same time lowers its value for many of us who get overwhelmed. It's hard to separate authentic efforts in getting started, which deserve help and encouraging vs moderating "work dumpers" that abuse resources to get their thing done. Also, beginner questions always repeat and a mailing list has no features to help with that. The model I had in imagined roughly follows the "Odersky scale": - Users new with the technology and basic "how to" questions belong in Stack Overflow. => The search and de-duplication features should help in getting an answer if already present, reducing the load. - Advanced discussions and troubleshooting belong in users@ - Library bugs, new features and improvements belong in dev@ Off course, there's no hard line between these levels and it would require contributor discretion aided with some routing procedure: - Spark documentation should establish Stack Overflow as the main go-to resource. - Contributors on the list should friendly redirect "intro level questions" to Stack Overflow. - SO contributors should redirect potential bugs and questions deserving a deeper discussion to @users or @dev as needed - @users -> @dev as today - Cross-posting SO + @users should be discouraged. The idea is to create efficient channels. A good resource on how and where to ask questions would be a great routing channel between the levels above. I'm willing to help with moderation efforts on "Spark Overflow" :-) to get this going. The Spark community has always been very welcoming and that spirit should be preserved. We just need to channel the efforts in a more efficient way. my 2c, Gerard. On Mon, Nov 7, 2016 at 11:24 PM, Maciej Szymkiewicz <mszymkiew...@gmail.com> wrote: > Just a couple of random thoughts regarding Stack Overflow... > > - If we are thinking about shifting focus towards SO all attempts of > micromanaging should be discarded right in the beginning. Especially things > like meta tags, which are discouraged and "burninated" ( > https://meta.stackoverflow.com/tags/burninate-request/info > <https://meta.stackoverflow.com/tags/burninate-request/info>) , or > thread bumping. Depending on a context these won't be manageable, go > against community guidelines or simply obsolete. > - Lack of expertise is unlikely an issue. Even now there is a number > of advanced Spark users on SO. Of course the more the merrier. > > Things that can be easily improved: > > - Identifying, improving and promoting canonical questions and > answers. It means closing duplicate, suggesting edits to improve existing > answers, providing alternative solutions. This can be also used to identify > gaps in the documentation. > - Providing a set of clear posting guidelines to reduce effort > required to identify the problem (think about > http://stackoverflow.com/q/5963269 <http://stackoverflow.com/q/5963269> > a.k.a How to make a great R reproducible example?) > - Helping users decide if question is a good fit for SO (see below). > API questions are great fit, debugging problems like "my cluster is slow" > are not. > - Actively cleaning (closing, deleting) off-topic and low quality > questions. The less junk to sieve through the better chance of good > questions being answered. > - Repurposing and actively moderating SO docs ( > https://stackoverflow.com/documentation/apache-spark/topics > <https://stackoverflow.com/documentation/apache-spark/topics>). Right > now most of the stuff that goes there is useless, duplicated or > plagiarized, or border case SPAM. > - Encouraging community to monitor featured (https://stackoverflow.com/ > questions/tagged/apache-spark?sort=featured > <https://stackoverflow.com/questions/tagged/apache-spark?sort=featured>) > and active & upvoted & unanswered (https://stackoverflow.com/ > unanswered/tagged/apache-spark) questions. > - Implementing some procedure to identify questions which are likely > to be bugs or a material for feature requests. Personally I am quite often > tempted to simply send a link to dev list, but I don't think it is really > acceptable. > - Animating Spark related chat room. I tried this a couple of times > but to no avail. Without a certain critical mass of users it just won't > work. > > > > On 11/07/2016 07:32 AM, Reynold Xin wrote: > > This is an excellent point. If we do go ahead and feature SO as a way for > users to ask questions more prominently, as someone who knows SO very well, > would you be willing to help write a short guideline (ideally the shorter > the better, which makes it hard) to direct what goes to user@ and what > goes to SO? > > > Sure, I'll be happy to help if I can. > > > > > On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <mszymkiew...@gmail.com > > wrote: > >> Damn, I always thought that mailing list is only for nice and welcoming >> people and there is nothing to do for me here >:) >> >> To be serious though, there are many questions on the users list which >> would fit just fine on SO but it is not true in general. There are dozens >> of questions which are to broad, opinion based, ask for external resources >> and so on. If you want to direct users to SO you have to help them to >> decide if it is the right channel. Otherwise it will just create a really >> bad experience for both seeking help and active answerers. Former ones will >> be downvoted and bashed, latter ones will have to deal with handling all >> the junk and the number of active Spark users with moderation privileges is >> really low (with only Massg and me being able to directly close duplicates). >> >> Believe me, I've seen this before. >> On 11/07/2016 05:08 AM, Reynold Xin wrote: >> >> You have substantially underestimated how opinionated people can be on >> mailing lists too :) >> >> On Sunday, November 6, 2016, Maciej Szymkiewicz <mszymkiew...@gmail.com> >> wrote: >> >>> You have to remember that Stack Overflow crowd (like me) is highly >>> opinionated, so many questions, which could be just fine on the mailing >>> list, will be quickly downvoted and / or closed as off-topic. Just >>> saying... >>> >>> -- >>> Best, >>> Maciej >>> >>> >>> On 11/07/2016 04:03 AM, Reynold Xin wrote: >>> >>> OK I've checked on the ASF member list (which is private so there is no >>> public archive). >>> >>> It is not against any ASF rule to recommend StackOverflow as a place for >>> users to ask questions. I don't think we can or should delete the existing >>> user@spark list either, but we can certainly make SO more visible than >>> it is. >>> >>> >>> >>> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <r...@databricks.com> >>> wrote: >>> >>>> Actually after talking with more ASF members, I believe the only policy >>>> is that development decisions have to be made and announced on ASF >>>> properties (dev list or jira), but user questions don't have to. >>>> >>>> I'm going to double check this. If it is true, I would actually >>>> recommend us moving entirely over the Q&A part of the user list to >>>> stackoverflow, or at least make that the recommended way rather than the >>>> existing user list which is not very scalable. >>>> >>>> >>>> On Wednesday, November 2, 2016, Nicholas Chammas < >>>> nicholas.cham...@gmail.com> wrote: >>>> >>>>> We’ve discussed several times upgrading our communication tools, as >>>>> far back as 2014 and maybe even before that too. The bottom line is that >>>>> we >>>>> can’t due to ASF rules requiring the use of ASF-managed mailing lists. >>>>> >>>>> For some history, see this discussion: >>>>> >>>>> - https://mail-archives.apache.org/mod_mbox/spark-user/201412. >>>>> mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@ >>>>> mail.gmail.com%3E >>>>> >>>>> <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E> >>>>> - https://mail-archives.apache.org/mod_mbox/spark-user/201501. >>>>> mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@ >>>>> mail.gmail.com%3E >>>>> >>>>> <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E> >>>>> >>>>> (It’s ironic that it’s difficult to follow the past discussion on why >>>>> we can’t change our official communication tools due to those very tools…) >>>>> >>>>> Nick >>>>> >>>>> >>>>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida < >>>>> ricardo.alme...@actnowib.com> wrote: >>>>> >>>>>> I fell Assaf point is quite relevant if we want to move this project >>>>>> forward from the Spark user perspective (as I do). In fact, we're >>>>>> still using 20th century tools (mailing lists) with some add-ons (like >>>>>> Stack Overflow). >>>>>> >>>>>> As usually, Sean and Cody's contributions are very to the point. >>>>>> I fell it is indeed a matter of of culture (hard to enforce) and tools >>>>>> (much easier). Isn't it? >>>>>> >>>>>> On 2 November 2016 at 16:36, Cody Koeninger <c...@koeninger.org> >>>>>> wrote: >>>>>> >>>>>>> So concrete things people could do >>>>>>> >>>>>>> - users could tag subject lines appropriately to the component >>>>>>> they're >>>>>>> asking about >>>>>>> >>>>>>> - contributors could monitor user@ for tags relating to components >>>>>>> they've worked on. >>>>>>> I'd be surprised if my miss rate for any mailing list questions >>>>>>> well-labeled as Kafka was higher than 5% >>>>>>> >>>>>>> - committers could be more aggressive about soliciting and merging >>>>>>> PRs >>>>>>> to improve documentation. >>>>>>> It's a lot easier to answer even poorly-asked questions with a link >>>>>>> to >>>>>>> relevant docs. >>>>>>> >>>>>>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> >>>>>>> wrote: >>>>>>> > There's already reviews@ and issues@. dev@ is for project >>>>>>> development itself >>>>>>> > and I think is OK. You're suggesting splitting up user@ and I >>>>>>> sympathize >>>>>>> > with the motivation. Experience tells me that we'll have a >>>>>>> beginner@ that's >>>>>>> > then totally ignored, and people will quickly learn to post to >>>>>>> advanced@ to >>>>>>> > get attention, and we'll be back where we started. Putting it in >>>>>>> JIRA >>>>>>> > doesn't help. I don't think this a problem that is merely down to >>>>>>> lack of >>>>>>> > process. It actually requires cultivating a culture change on the >>>>>>> community >>>>>>> > list. >>>>>>> > >>>>>>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf < >>>>>>> assaf.mendel...@rsa.com> >>>>>>> > wrote: >>>>>>> >> >>>>>>> >> What I am suggesting is basically to fix that. >>>>>>> >> >>>>>>> >> For example, we might say that mailing list A is only for voting, >>>>>>> mailing >>>>>>> >> list B is only for PR and have something like stack overflow for >>>>>>> developer >>>>>>> >> questions (I would even go as far as to have beginner, >>>>>>> intermediate and >>>>>>> >> advanced mailing list for users and beginner/advanced for dev). >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> This can easily be done using stack overflow tags, however, that >>>>>>> would >>>>>>> >> probably be harder to manage. >>>>>>> >> >>>>>>> >> Maybe using special jira tags and manage it in jira? >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> Anyway as I said, the main issue is not user questions (except >>>>>>> maybe >>>>>>> >> advanced ones) but more for dev questions. It is so easy to get >>>>>>> lost in the >>>>>>> >> chatter that it makes it very hard for people to learn spark >>>>>>> internals… >>>>>>> >> >>>>>>> >> Assaf. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> From: Sean Owen [mailto:so...@cloudera.com] >>>>>>> >> Sent: Wednesday, November 02, 2016 2:07 PM >>>>>>> >> To: Mendelson, Assaf; dev@spark.apache.org >>>>>>> >> Subject: Re: Handling questions in the mailing lists >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> I think that unfortunately mailing lists don't scale well. This >>>>>>> one has >>>>>>> >> thousands of subscribers with different interests and levels of >>>>>>> experience. >>>>>>> >> For any given person, most messages will be irrelevant. I also >>>>>>> find that a >>>>>>> >> lot of questions on user@ are not well-asked, aren't an SSCCE >>>>>>> >> (http://sscce.org/), not something most people are going to >>>>>>> bother replying >>>>>>> >> to even if they could answer. I almost entirely ignore user@ >>>>>>> because there >>>>>>> >> are higher-priority channels like PRs to deal with, that already >>>>>>> have >>>>>>> >> hundreds of messages per day. This is why little of it gets an >>>>>>> answer -- too >>>>>>> >> noisy. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> We have to have official mailing lists, in any event, to have some >>>>>>> >> official channel for things like votes and announcements. It's >>>>>>> not wrong to >>>>>>> >> ask questions on user@ of course, but a lot of the questions I >>>>>>> see could >>>>>>> >> have been answered with research of existing docs or looking at >>>>>>> the code. I >>>>>>> >> think that given the scale of the list, it's not wrong to assert >>>>>>> that this >>>>>>> >> is sort of a prerequisite for asking thousands of people to >>>>>>> answer one's >>>>>>> >> question. But we can't enforce that. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> The situation will get better to the extent people ask better >>>>>>> questions, >>>>>>> >> help other people ask better questions, and answer good >>>>>>> questions. I'd >>>>>>> >> encourage anyone feeling this way to try to help along those >>>>>>> dimensions. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson < >>>>>>> assaf.mendel...@rsa.com> >>>>>>> >> wrote: >>>>>>> >> >>>>>>> >> Hi, >>>>>>> >> >>>>>>> >> I know this is a little off topic but I wanted to raise an issue >>>>>>> about >>>>>>> >> handling questions in the mailing list (this is true both for the >>>>>>> user >>>>>>> >> mailing list and the dev but since there are other options such >>>>>>> as stack >>>>>>> >> overflow for user questions, this is more problematic in dev). >>>>>>> >> >>>>>>> >> Let’s say I ask a question (as I recently did). Unfortunately >>>>>>> this was >>>>>>> >> during spark summit in Europe so probably people were busy. In >>>>>>> any case no >>>>>>> >> one answered. >>>>>>> >> >>>>>>> >> The problem is, that if no one answers very soon, the question >>>>>>> will almost >>>>>>> >> certainly remain unanswered because new messages will simply >>>>>>> drown it. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> This is a common issue not just for questions but for any comment >>>>>>> or idea >>>>>>> >> which is not immediately picked up. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> I believe we should have a method of handling this. >>>>>>> >> >>>>>>> >> Generally, I would say these types of things belong in stack >>>>>>> overflow, >>>>>>> >> after all, the way it is built is perfect for this. More seasoned >>>>>>> spark >>>>>>> >> contributors and committers can periodically check out unanswered >>>>>>> questions >>>>>>> >> and answer them. >>>>>>> >> >>>>>>> >> The problem is that stack overflow (as well as other targets such >>>>>>> as the >>>>>>> >> databricks forums) tend to have a more user based orientation. >>>>>>> This means >>>>>>> >> that any spark internal question will almost certainly remain >>>>>>> unanswered. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> I was wondering if we could come up with a solution for this. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> Assaf. >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> ________________________________ >>>>>>> >> >>>>>>> >> View this message in context: Handling questions in the mailing >>>>>>> lists >>>>>>> >> Sent from the Apache Spark Developers List mailing list archive at >>>>>>> >> Nabble.com. >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> --------- >>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>> >>> >> > > -- > Maciej Szymkiewicz > >