[ https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173986#comment-17173986 ]
Alexandre Rafalovitch commented on SOLR-14726: ---------------------------------------------- There are so many points in here that It is hard to answer them all together. But, since my opinion was asked, I feel that the steps proposed go into opposite direction from its title. Of course, I do not have as much exposure to the real users as other participants, so below are my strong opinions but with a very large sack of salt. I am also going to comment in random order a bit. (tl;dr of disagreed part) I think we should have a new coherent (example? production?) configset with matching large and interesting example dataset that we use to demonstrate both classic and new features and we should keep post and focus on explaining it better. # Removing the examples. We currently ship with 10-ish, we are about to lose 5 of them (DIH ones). I agree that what we have is confusing. However, removing them all is not the right answer. I feel that we should have one complex example dataset that we use in multiple ways to demonstrate lots of Solr features. Techproducts used to be that, but is rather out of date and is not really internally consistent. Films was aiming to become that but the source service has disappeared and it had its own little issues. I have been looking for a potential example for a while and the one that appeals to me most is a [https://www.fakenamegenerator.com/] (which allows for bulk generations). This would give us multiple field types to demonstrate, advanced searches/analysis and multilingual aspects. Maybe we can have the dataset split into chunks with each chunk using different format Solr support (similar to films example). # Using curl - I am with Erick that post tool is better than curl and we worked for it to be more explicit on explaining what it is actually doing (with base URL vs destination logging). I think we should explain its output better so people know what to look for. # Postman/Insomnia are good in theory, but I heard Postman's company strategy made it less and less reliable as a tool to promote. I don't know about Insomnia. It would have been nice to have commands in some consistent way. # Google Colab and output.serve_kernel_port_as_window trick looks really interesting and potentially promising. Could that be used instead of Postman/Insomnia/curl? # V2 API - yes, totally # Docker? Maybe, no opinion; I use docker for other projects, it is nice. But I don't know if it is an official path for Solr distribution (just honestly, out of the loop on that) # Auth - good idea, I guess. As a first step, I don't know. But somewhere in the process. # First example using cloud - I was never super comfortable with that. To me, it feels as a ES-competition move, similar to the schemaless issue with semi-expected negative consequences. I think the first example should be super simple single Solr/Collection start. Then, the further example should introduce cloud and related schema evolution process differences. So, for example, the cloud example would take the same fake names dataset and then do graph analysis on it or machine learning or some other advanced features we have only in cloud configuration. I am aware that there is a discussion to make everything cloud under the hood in future Solr, but don't think that was actually decided, partially because for a lot of people, single Solr instance is more than sufficient. # Make the tutorial shorter? Part of the length is the cloud instructions, part of it is the screenshots, which is very useful. I don't think it is the length that matters, but the fact that the current text is a bit all over the place and not super coherent, including switching between different datasets and schemas without properly indicating it. # Configset - suggested to be removed as kind of a part of a point 10. I think we need a new configset to go together with new example (back to my point 1) that is coherent with the new Solr features. That's a big discussion on its own (e.g. Do we need to demonstrate requestHandlers and initParams and overrides and ... all in one file?). We should also recognize that the documentation should no longer live in the configset, but be in the reference guide, especially for the managed-schema files where all comments get blown away on first API change. Recognizing this, would allow us to move commented-out defaults out of those files as well, making them shorter and easier to read. # I do not recognize anything in the original suggestion as specifically addressing "that should also be followed in production". That, to me, is a huge question, as none of the current configsets are 'production ready' and I don't see specific suggestions to strengthen it. Nor do I, myself, truly know what production-ready schema/solrconfig should look like. # Regarding JSON and JSONL and different end points and auto-detecting the JSON variant actually being used, I distinctly remember being extremely confused myself when I was writing the presentation on JSON in Solr ([https://www.slideshare.net/arafalov/json-in-solr-from-top-to-bottom)] and a lot of conference participants reflected that confusion of their own. Part of that is that Solr's own approach is confusing due to the evolutionary nature of how we layered the implementations. > Streamline getting started experience > ------------------------------------- > > Key: SOLR-14726 > URL: https://issues.apache.org/jira/browse/SOLR-14726 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ishan Chattopadhyaya > Priority: Major > Labels: newdev > > The reference guide Solr tutorial is here: > https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html > It needs to be simplified and easy to follow. Also, it should reflect our > best practices, that should also be followed in production. I have following > suggestions: > # Make it less verbose. It is too long. On my laptop, it required 35 page > downs button presses to get to the bottom of the page! > # First step of the tutorial should be to enable security (basic auth should > suffice). > # {{./bin/solr start -e cloud}} <-- All references of -e should be removed. > # All references of {{bin/solr post}} to be replaced with {{curl}} > # Convert all {{bin/solr create}} references to curl of collection creation > commands > # Add docker based startup instructions. > # Create a Jupyter Notebook version of the entire tutorial, make it so that > it can be easily executed from Google Colaboratory. Here's an example: > https://twitter.com/TheSearchStack/status/1289703715981496320 > # Provide downloadable Postman and Insomnia files so that the same tutorial > can be executed from those tools. Except for starting Solr, all other steps > should be possible to be carried out from those tools. > # Use V2 APIs everywhere in the tutorial > # Remove all example modes, sample data (films, tech products etc.), > configsets from Solr's distribution (instead let the examples refer to them > from github) > # Remove the post tool from Solr, curl should suffice. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org