mocobeta commented on pull request #238:
URL: https://github.com/apache/lucene/pull/238#issuecomment-894810335


   I was also thinking about how do we (and users) obtain the example vectors 
data if we provide a "standalone" demo besides the IndexFiles/Searches 
integrated one.
   There are two possible options we could take:
   
   1. Sample random vectors from uniform or normal distribution when performing 
indexing/searching.
   Of course, the generated vectors are not meaningful at all - but one could 
say that the "meaning" of vectors is up to specific model or application, and 
what we provide is general "vector search" functionality anyway...
   
   2. Generate word representations of some publicly available corpus (e.g. 
Project Gutenberg) by using GloVe; then include a small fraction of them within 
demo module distribution.
   While proper credits are required, distributing a dataset that is converted 
from copyright-free texts and public domain embeddings (GloVe) would not be 
problematic, I think. (Though if we come into a somewhat difficult discussion 
on this, I wouldn't push this plan.)
   
   > So what we're doing here is different from the benchmarks since we're 
redistributing (a portion of) the GloVe data, unlike benchmarks which requires 
the user to download the wikipedia data (or does it for them).
   
   I didn't notice that the fraction of GloVe data was included...; yes I think 
it would be great if we have some decent credits (or notice?) for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to