On Fri, Oct 19, 2018 at 3:40 PM Trey Jones <[email protected]> wrote:
>
> Instead of "the capacity" I meant "this capacity", but should have said "this 
> feature", referring to Elasticsearch integration—though the information on 
> system capacity was still interesting.

Isn't that "capability" more than "capacity" (I'm trying to improve my
English here). Though I knew that is sounded ambiguous!

> On Fri, Oct 19, 2018 at 3:57 AM, Guillaume Lederrey <[email protected]> 
> wrote:
>>
>> On Thu, Oct 18, 2018 at 4:48 PM Trey Jones <[email protected]> wrote:
>> >
>> > Hi Everyone,
>> >
>> > I'm at WikiConference NA today, and I was chatting with someone from OCLC, 
>> > and he mentioned that BlazeGraph can be configured to call out to a 
>> > full-text search engine. It looks like it only works with SOLR out of the 
>> > box, but the documentation mentions that Elasticsearch is a candidate 
>> > search endpoint.
>> >
>> > Obviously it wouldn't be worth doing any real work on investigating this 
>> > until the BlazeGraph/Amazon situation is clearer, and maybe Stas or others 
>> > have looked at it in the past and already know why it isn't worth the 
>> > added complexity, but there are some interesting use cases where combining 
>> > full text and SPARQL would be useful—for example if you are looking for a 
>> > person, you know part of their name, and some facts about them. In 
>> > general, any full-text search with additional structured data constraints.
>> >
>> > Anyone already know anything about the capacity of BlazeGraph?
>>
>> It all depends on what you mean by "capacity" and by "blazegraph". If
>> by capacity you mean do we have enough hardware, the answer is not
>> entirely easy.
>>
>> The cluster servicing the public wdqs endpoint (which probably means
>> "blazegraph" in this context) has widely varying load patterns, is
>> sometime overloaded and is overall difficult to size correctly
>> (especially since we don't have a good definition of what a good SLO
>> would be, see  [1]).
>>
>> The internal wdqs endpoint is in a much better situation, with a more
>> controlled load and a reasonable amount of headroom. I don't have a
>> good visibility on the projects that might start using this internal
>> cluster more, so that headroom might be consumed fairly quickly
>> depending of what load we add to the cluster.
>>
>> Last point: I have no idea what that blazegraph / elasticsearch
>> integration looks like, but it sounds like it might be possible to
>> generate arbitrary elasticsearch queries from SPARQL. If that's the
>> case, we don't want to expose such a functionality on the public wdqs
>> endpoint, or at least not with our current production elasticsearch
>> backend as the target. That being said, it sounds like a very
>> interesting idea!
>>
>> Have fun!
>>
>>    Guillaume
>>
>>
>> [1] https://phabricator.wikimedia.org/T199228
>>
>> > Thanks,
>> > —Trey
>> >
>> > Trey Jones
>> > Sr. Software Engineer, Search Platform
>> > Wikimedia Foundation
>> > _______________________________________________
>> > Discovery mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/discovery
>>
>>
>>
>> --
>> Guillaume Lederrey
>> Operations Engineer, Search Platform
>> Wikimedia Foundation
>> UTC+2 / CEST
>>
>> _______________________________________________
>> Discovery mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
> _______________________________________________
> Discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery



-- 
Guillaume Lederrey
Operations Engineer, Search Platform
Wikimedia Foundation
UTC+2 / CEST

_______________________________________________
Discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to