Beginners should experience as little black magic as possible. Post tool is black magic. Schemaless is black magic. I feel we should remove both.
On Thu, 29 Apr, 2021, 2:56 am Alexandre Rafalovitch, <[email protected]> wrote: > "Good enough/Recommended" for what? Serious question. > > Because it may be - more than - good enough to "send files to the > server", but the post tool is also doing a lot of Solr business logic > that beginner users may not have understood yet. Like automatic > commit. Like choosing endpoint and content type based on the file > extension. Like actually saying what it is doing. Beginners may not > have the bandwidth to understand all those elements in order to index > their second document (first document being the tutorial one > copy/paste here). > > Removing a post tool because curl is good enough - in my personal view > - is abandoning beginners. Unless, that "for what" is clear and the > gap between curl and post tool is filled in some other ways, through > better documentation or improved API or whatever. > > On the original question, I think the post tool is like DIH and like > the default schema, people stick to them and push their boundaries > because our beginner->production story is full of gaps. What to do > about it though, I am not sure. A suggested warning seems like a > reasonable non-harmful suggestion, though. > > Regards, > Alex. > > On Wed, 28 Apr 2021 at 17:04, Ishan Chattopadhyaya > <[email protected]> wrote: > > > > We should remove the post tool > > Altogether. Curl is good enough and recommended. > > > > On Thu, 29 Apr, 2021, 2:15 am Gus Heck, <[email protected]> wrote: > >> > >> I've generally been of the impression/opinion that the Post Tool is > really just a convenience for folks testing out solr to see what it can do, > and not really meant as a production ingestion solution. > >> > >> A little while back I had a client that had a third party tool that > "integrated with solr" by invoking post.jar on documents with a script to > loop through all the files in a directory and post them (the third party > software's direct example of how to integrate, not the client's idea at > all). Needless to say this caused difficulties with the gigabytes of data > the third party tool had stored in many directories. Of course I don't > know, but I'd guess that someone with little experience was tasked with the > integration with solr at the third party software company and they followed > some examples... then turned them into an "integration" blissfully unaware > of the limitations of what they had done. > >> > >> I just re-read the ref guide page on post tool, and there's nothing > there to indicate to the reader that this might not be a good production > level solution. Also I notice a couple of recent Jira issues regarding > handling of corner cases of strange (broken) behavior or content in a web > site's response, giving the impression that that user (who reported both > issues) might be treading a path that will stretch the bounds of what the > post tool can/should be relied upon for. > >> > >> https://issues.apache.org/jira/browse/SOLR-15381 > >> https://issues.apache.org/jira/browse/SOLR-15370 > >> > >> How do folks feel about adding a warning or info box at the top of post > tool docs indicating that it is not meant as a production solution, only as > a quick way to test out documents. We might also say something more > concrete like "virtually any use for a corpus containing over a few > thousand documents is a bad idea"? ... or something like that, suggestions > welcome... > >> > >> If folks agree then it seems that these two issues are likely to be > WONTFIX. > >> > >> -Gus > >> > >> -- > >> http://www.needhamsoftware.com (work) > >> http://www.the111shift.com (play) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
