Re: [higgins-dev] Re: Bottleneck points in PDS service

Markus Sabadello Wed, 16 Jun 2010 11:08:48 -0700

On Wed, Jun 16, 2010 at 7:45 PM, Joseph Boyle <[email protected]>wrote:


> We could have an abstract class with more than one concrete implementation
> subclass - unparsed string, node tree possibly including unparsed strings
> for subexpressions, fully parsed but XRIs not verified, fully parsed and
> verified.
>
> Can we depend on XDI messages being in a canonical format? If equivalent
> messages with different text are possible, then string equality won't be
> enough.
>

No, there is no canonical format. The same XDI graph can be serialized in
different forms (because just like in RDF there is no built-in order of
subjects, predicates and objects). But I'm not talking about the XDI graph
itself, I'm talking about the XRIs inside the graph. Parsing =markus is
simple, but parsing e.g. (/+!15$v!3) is less simple, and Sergey suggested
this may introduce a performance bottleneck.


> How can we collect data on how long parsing (and other operations) are
> taking?
>

Sergey has done research on this and posted some detailed data earlier in
this thread.


>
> On Jun 16, 2010, at 10:38 AM, Markus Sabadello wrote:
>
> I may have an idea for addressing the other concern as well (the time it
> takes to parse XDI data). Not sure if this is a priority, but anyway here's
> the idea:
>
> Right now, the XDI server parses every single XRI in an XDI message. While
> it sometimes IS important for both XDI servers and clients to fully
> "understand" the XRIs (after all that's one of the advantages of XDI), in
> many other situations it is enough to internally treat them like strings.
> For example, if the XDI server encounters a $get XRI, it only needs to know
> that it is $get, but it doesn't need to understand the details that the XRI
> consists of a single subsegment with a $ global context symbol.
>
> So what I am saying is that we could modify the XRI3, XRI3Segment and other
> classes to just act as wrappers for java.lang.String, and only actually
> invoke the ABNF parser when that is necessary. For common methods such as
> equals(), hashCode() and toString() parsing is not necessary, and it should
> therefore be possible to save time.
>
> The one behavioral change from an outside perspective would be that "new
> XRI3Segment()" would no longer immediately throw an exception for invalid
> XRIs. Not sure if this a problem. The XDI server may end up accepting
> messages that actually contain invalid XRIs.
>
> Hmm on second thought, this may introduce some risks and problems. Maybe
> not such a good idea after all :( But anyway, I wanted to quickly post it..
>
> Markus
>
> On Tue, Jun 8, 2010 at 7:00 PM, Mike McIntosh <[email protected]> wrote:
>
>> Thanks Markus,
>>
>>
>> Sergey and Valery, can you please update and build and check the status of
>> the bottleneck problem?
>>
>>
>> Regards,
>>
>> Michael McIntosh
>>
>> VP Development
>>
>> Azigo
>>
>>
>> *From:* [email protected] [mailto:
>> [email protected]] *On Behalf Of *Markus Sabadello
>> *Sent:* Tuesday, June 08, 2010 12:59 PM
>> *To:* [email protected]
>> *Cc:* [email protected]
>> *Subject:* [higgins-dev] Re: Bottleneck points in PDS service
>>
>>
>> Hello,
>>
>> I have checked in changes to return fresh XDIReader and XDIWriter
>> instances instead of singletons.
>>
>> Markus
>>
>> On Fri, May 21, 2010 at 7:44 PM, Markus Sabadello <[email protected]>
>> wrote:
>>
>> Hello Sergey,
>>
>> On Fri, May 21, 2010 at 4:13 PM, Sergey Lyakhov <[email protected]>
>> wrote:
>>
>> Markus,
>>
>> I've profiled/debugged PDS service and found two bottlenecks in XDI4J:
>>
>> 1. The most part of processing time takes xdi4j.xri3.impl.parser.Parser,
>> see attached 1_thread.html.
>> However this is a class generated from ABNF, and I am not sure there is a
>> way to significantly increase its performance.
>>
>>
>> Yes I agree this is probably not possible..
>> The only option here would be to use String instead of XRI3Segment, but
>> this would have big implications on the entire library, because in some
>> places the functionality of XRI3Segment is really needed.
>>
>> 2. XDI has multithreading problems. The time of processing for parallel
>> threads increases linearly
>> the number of threads (see attached 5_threads.html and 10_threads.html).
>>
>> This occurs because XDIReaderRegistry contains singleton readers, which,
>> in turn, are not thread-safe and
>> contain synchronized method read(). So, all threads in
>> EndpointServlet.readFromBody() method get the same singleton
>>  instance of XDIReader from XDIReaderRegistry and wait on its read()
>> method. We can try to fix that by changing
>> XDIReaderRegistry to return a new instance of reader instead of singleton.
>>
>>
>> The reason why I used singletons was that I thought it's better to re-use
>> the reader objects instead of creating/destroying them all the time. But
>> yes, maybe I was wrong and its better to return new instances for every read
>> operation.
>>
>> Another thing to make sure is that the client sets the header
>> Content-Type: text/xdi+x3, so that the XDI server uses the X3StandardReader
>> instead of the AutoReader, but I think this is happening already.
>>
>> Out of curiosity, what software did you use for profiling?
>>
>> Markus
>>
>>
>> Thanks,
>> Sergey Lyakhov
>>
>>
>>
>>
>> _______________________________________________
>> higgins-dev mailing list
>> [email protected]
>> https://dev.eclipse.org/mailman/listinfo/higgins-dev
>>
>>
> _______________________________________________
> higgins-dev mailing list
> [email protected]
> https://dev.eclipse.org/mailman/listinfo/higgins-dev
>
>
>
> _______________________________________________
> higgins-dev mailing list
> [email protected]
> https://dev.eclipse.org/mailman/listinfo/higgins-dev
>
>

_______________________________________________
higgins-dev mailing list
[email protected]
https://dev.eclipse.org/mailman/listinfo/higgins-dev

Re: [higgins-dev] Re: Bottleneck points in PDS service

Reply via email to