Hi Alan,

I suspect you are seeing slower performance with "expand" specified simply
because of how that "expand" parameter works.  By default, the REST API
calls return minimal information (to keep requests quick).  But, if you
require much more detailed information, the "expand" parameter is available
to tell the REST API "I really need more information here".  Simply put,
when you ask for more information, requests will take longer (obviously).

That said, the way in which "expand" is currently implemented is NOT
ideal.  When you tell the DSpace 5.x or 6.x REST API that you want
"expand=metadata" (i.e. give me all the metadata), it literally loops
through all metadata fields, checks if any are flagged as "isHidden" and
adds them one by one to the response:
https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-rest/src/main/java/org/dspace/rest/common/Item.java#L71

The same thing happens when you say "expand=bitstreams" (i.e. give me all
the bitstreams)... it literally loops through all bundles, finding all
accessible bitstreams, and adds them one by one to the response:
https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-rest/src/main/java/org/dspace/rest/common/Item.java#L121


So, as you can see, if you are including a lot of "expand" options in your
request, this will quickly slow things down...*unless* you decrease your
paging options (e.g. use a lower "limit" of 20 or similar).

As a sidenote, the way in which our REST API handles such requests is
changing drastically in DSpace 7 REST API.  In the development of DSpace 7,
we quickly realized that the DSpace 5.x / 6.x REST API has several areas
where major performance issues present themselves. This is why we are
deprecating this old 5.x-6.x REST API in the DSpace 7 release (it will be
dropped entirely in DSpace 8). DSpace 7 will be providing a brand new,
optimized, fully-featured REST API as a replacement. I know this doesn't
solve your immediate issues, but I just wanted to assure you that you are
not alone in finding these performance & usability issues with the current
REST API.

- Tim

On Tue, Oct 16, 2018 at 4:31 PM Alan Orth <[email protected]> wrote:

> Hello,
>
> If I use several expands while iterating over the results of the REST
> API's /items endpoint the request takes about ten times longer than without
> the expands. In my unscientific benchmarks the performance is consistently
> poor on both our production and development DSpace instances. A few runs on
> each server, with and without expands:
>
> $ time curl -s '
> https://production.example.com/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
> > /dev/null
> ...
> 0.35s user 0.06s system 1% cpu 25.133 total
> 0.31s user 0.04s system 1% cpu 25.223 total
> 0.27s user 0.06s system 1% cpu 27.858 total
>
> $ time curl -q '
> https://production.example.com/rest/items?limit=100&offset=0' > /dev/null
> 0.03s user 0.01s system 1% cpu 3.085 total
> 0.03s user 0.01s system 1% cpu 2.800 total
> 0.03s user 0.02s system 1% cpu 3.008 total
>
> $ time curl -s '
> https://development.example.com/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
> > /dev/null
> ...
> 0.22s user 0.03s system 1% cpu 17.248 total
> 0.23s user 0.02s system 1% cpu 16.856 total
> 0.23s user 0.04s system 1% cpu 16.460 total
>
> $ time curl -s '
> https://development.example.com/rest/items?limit=100&offset=0' > /dev/null
> 0.04s user 0.01s system 1% cpu 3.542 total
> 0.02s user 0.02s system 1% cpu 3.565 total
> 0.01s user 0.02s system 0% cpu 3.480 total
>
> These systems are both running Ubuntu 16.04, PostgreSQL 9.5, Java 8 (one
> Oracle, one OpenJDK), and DSpace 5.8 with lots of RAM, SSDs, and four or
> more CPUs each. Lots of people are harvesting us and it takes forever to
> iterate over our 75,000 items. Not to mention, if we have more than a few
> concurrently we start returning HTTP 500 errors!
>
> Where is the bottleneck in the REST API? How can I profile this? Is this
> something that can be improved with a query cache or database indexes in
> PostgreSQL?
>
> Thanks!
> --
> Alan Orth
> [email protected]
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ―Friedrich Nietzsche
>
> --
> All messages to this mailing list should adhere to the DuraSpace Code of
> Conduct: https://duraspace.org/about/policies/code-of-conduct/
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>
-- 
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to