[ 
https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173986#comment-17173986
 ] 

Alexandre Rafalovitch commented on SOLR-14726:
----------------------------------------------

There are so many points in here that It is hard to answer them all together. 
But, since my opinion was asked, I feel that the steps proposed go into 
opposite direction from its title. Of course, I do not have as much exposure to 
the real users as other participants, so below are my strong opinions but with 
a very large sack of salt. I am also going to comment in random order a bit.

(tl;dr of disagreed part) I think we should have a new coherent (example? 
production?) configset with matching large and interesting example dataset that 
we use to demonstrate both classic and new features and we should keep post and 
focus on explaining it better.
 # Removing the examples. We currently ship with 10-ish, we are about to lose 5 
of them (DIH ones). I agree that what we have is confusing. However, removing 
them all is not the right answer. I feel that we should have one complex 
example dataset that we use in multiple ways to demonstrate lots of Solr 
features. Techproducts used to be that, but is rather out of date and is not 
really internally consistent. Films was aiming to become that but the source 
service has disappeared and it had its own little issues. I have been looking 
for a potential example for a while and the one that appeals to me most is a 
[https://www.fakenamegenerator.com/] (which allows for bulk generations). This 
would give us multiple field types to demonstrate, advanced searches/analysis 
and multilingual aspects. Maybe we can have the dataset split into chunks with 
each chunk using different format Solr support (similar to films example). 
 # Using curl - I am with Erick that post tool is better than curl and we 
worked for it to be more explicit on explaining what it is actually doing (with 
base URL vs destination logging). I think we should explain its output better 
so people know what to look for.
 # Postman/Insomnia are good in theory, but I heard Postman's company strategy 
made it less and less reliable as a tool to promote. I don't know about 
Insomnia. It would have been nice to have commands in some consistent way.
 # Google Colab and output.serve_kernel_port_as_window trick looks really 
interesting and potentially promising. Could that be used instead of 
Postman/Insomnia/curl? 
 # V2 API - yes, totally
 # Docker? Maybe, no opinion; I use docker for other projects, it is nice. But 
I don't know if it is an official path for Solr distribution (just honestly, 
out of the loop on that)
 # Auth - good idea, I guess. As a first step, I don't know. But somewhere in 
the process.
 # First example using cloud - I was never super comfortable with that. To me, 
it feels as a ES-competition move, similar to the schemaless issue with 
semi-expected negative consequences. I think the first example should be super 
simple single Solr/Collection start. Then, the further example should introduce 
cloud and related schema evolution process differences. So, for example, the 
cloud example would take the same fake names dataset and then do graph analysis 
on it or machine learning or some other advanced features we have only in cloud 
configuration. I am aware that there is a discussion to make everything cloud 
under the hood in future Solr, but don't think that was actually decided, 
partially because for a lot of people, single Solr instance is more than 
sufficient.
 # Make the tutorial shorter? Part of the length is the cloud instructions, 
part of it is the screenshots, which is very useful. I don't think it is the 
length that matters, but the fact that the current text is a bit all over the 
place and not super coherent, including switching between different datasets 
and schemas without properly indicating it.
 # Configset - suggested to be removed as kind of a part of a point 10. I think 
we need a new configset to go together with new example (back to my point 1) 
that is coherent with the new Solr features. That's a big discussion on its own 
(e.g. Do we need to demonstrate requestHandlers and initParams and overrides 
and ... all in one file?). We should also recognize that the documentation 
should no longer live in the configset, but be in the reference guide, 
especially for the managed-schema files where all comments get blown away on 
first API change. Recognizing this, would allow us to move commented-out 
defaults out of those files as well, making them shorter and easier to read.
 # I do not recognize anything in the original suggestion as specifically 
addressing "that should also be followed in production". That, to me, is a huge 
question, as none of the current configsets are 'production ready' and I don't 
see specific suggestions to strengthen it. Nor do I, myself, truly know what 
production-ready schema/solrconfig should look like.
 # Regarding JSON and JSONL and different end points and auto-detecting the 
JSON variant actually being used, I distinctly remember being extremely 
confused myself when I was writing the presentation on JSON in Solr 
([https://www.slideshare.net/arafalov/json-in-solr-from-top-to-bottom)] and a 
lot of conference participants reflected that confusion of their own. Part of 
that is that Solr's own approach is confusing due to the evolutionary nature of 
how we layered the implementations.

> Streamline getting started experience
> -------------------------------------
>
>                 Key: SOLR-14726
>                 URL: https://issues.apache.org/jira/browse/SOLR-14726
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Priority: Major
>              Labels: newdev
>
> The reference guide Solr tutorial is here:
> https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html
> It needs to be simplified and easy to follow. Also, it should reflect our 
> best practices, that should also be followed in production. I have following 
> suggestions:
> # Make it less verbose. It is too long. On my laptop, it required 35 page 
> downs button presses to get to the bottom of the page!
> # First step of the tutorial should be to enable security (basic auth should 
> suffice).
> # {{./bin/solr start -e cloud}} <-- All references of -e should be removed.
> # All references of {{bin/solr post}} to be replaced with {{curl}}
> # Convert all {{bin/solr create}} references to curl of collection creation 
> commands
> # Add docker based startup instructions.
> # Create a Jupyter Notebook version of the entire tutorial, make it so that 
> it can be easily executed from Google Colaboratory. Here's an example: 
> https://twitter.com/TheSearchStack/status/1289703715981496320
> # Provide downloadable Postman and Insomnia files so that the same tutorial 
> can be executed from those tools. Except for starting Solr, all other steps 
> should be possible to be carried out from those tools.
> # Use V2 APIs everywhere in the tutorial
> # Remove all example modes, sample data (films, tech products etc.), 
> configsets from Solr's distribution (instead let the examples refer to them 
> from github)
> # Remove the post tool from Solr, curl should suffice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to