[go-nuts] [ANN] RAIS: Fully open-source IIIF-compliant image server... and BONUS: the exciting story of how Go made its way into UO Libraries

Jeremy Echols Wed, 23 Nov 2016 10:25:11 -0800

*Project:*

This one's been out a long time, but I wanted to get to a place where it 
felt solid before announcing it to this list.  RAIS 
(https://github.com/uoregon-libraries/rais-image-server) is CC0-licensed 
and backs all the dynamic pan/zoom image-serving needs for Oregon Historic 
Newspapers (e.g., 
http://oregonnews.uoregon.edu/lccn/sn94052322/1888-05-03/ed-1/seq-1/).


The project conforms to the IIIF Image 2.0 spec 
(http://iiif.io/api/image/2.0/), but its main purpose is serving JP2 images 
as fast as possible, which it achieves with low-level CGO calls into 
libopenjpeg.  JP2 images are incredibly small for their quality, but 
decoding is notoriously slow.  Additionally, while there are very fast 
alternatives to JP2, very few are as space-efficient, and none are as 
memory-efficient.  For files in the 20-megapixel-plus range, we needed a 
format that doesn't require reading the whole image into memory, which 
tiled JP2 images do very well.

It's a pretty niche service, but I think its history tells a really great 
Go story even if nobody here has need of the service itself.

*Story:*

At the time RAIS was initially created, we had a pretty big problem: we had 
somewhere in the range of 30 terabytes of TIFFs backing our pan/zoom 
viewer, and we knew that number was only going up.  While we will always 
preserve the TIFF files (we're a library, it's what we do!), keeping them 
online at all times was far more expensive than, say, tape backups or a 
dark archive.  And, of course, reading TIFFs into memory meant 20+ megs of 
RAM **per request** (these are grayscale TIFFs for those about to say 20 
megapixels should mean 60 megs of RAM).  During times of high traffic, RAM 
could become a significant bottleneck.

We considered pyramidal TIFFs with embedded JPGs and IIP image server, but 
found that we would "only" save about 80% on disk in order to get similar 
quality to the JP2 files we already had.  JP2 images, on the other hand, 
saved closer to 95% disk.

We considered pre-generating the tiles for about half a second.  But at the 
time we had about 500,000 individual newspaper pages.  Pre-generating of 
tiles would absolutely not work for us.  At least, not with any kind of 
disk savings.

We considered using proprietary JP2 libraries, which we knew could solve 
the problem really well.  But we wanted the software to be as open as 
possible.  One of our biggest contributions to the newspaper world was 
getting the software which runs our site open-sourced to begin with (it 
isn't something we wrote, just something we customized heavily, and 
convinced the authors to open-source).  Having done that work, we felt like 
it was a disservice to the community if we had to use proprietary software 
just to get the open-sourced software working.

We considered a slow JP2 server with a giant cache.  The software which 
runs the site *can* serve JP2 tiles without proprietary software... but the 
initial image can take 10+ seconds to load, and heavy traffic can make it 
almost unusable.  Hence, caching!  ...but can we get a lot more hits than 
misses?  Caching thumbnails turned out to be valuable for us, but tiles... 
not so much.  Looking at what was requested in the Apache logs, it seemed 
that the tiles served in any given week were mostly (75%) tiles that had 
*not* been served throughout the entire month.  Caching would certainly 
have some benefit, but a large number of our users would be hitting the 
very slow cache-miss pages, or else our cache would have been far too big 
to be feasible.

Sometime in late 2013 or early 2014, somebody at the Library of Congress 
showed us this project which he'd called "Brikker".  It was written months 
prior in Go as a proof-of-concept to solve similar problems to ours.  It 
required pulling a specific commit of openjpeg, manually patching it, and 
compiling it.  But it was capable of serving JP2 tiles dynamically, and the 
author claimed it was fairly performant.  We decided we should at least 
look into it, even though nobody knew Go and I for one was pretty skeptical 
of this silly "new" language.

We realized quickly that something like PHP or Python just wouldn't be able 
to do what Go could, at least not with anywhere near the same performance, 
and this use case was one which demanded performance.  Calling into C is a 
bit of a pain in every language we considered, and the performance of the 
rest of the server could be an immediate bottleneck.  Better to stick with 
something that already appeared to have potential than take that risk.

So we dove into the world of Go, and slowly improved the original 
application until it could be put into production.

Now our TIFFs are somewhere far away from the web server and our RAM usage 
is incredibly low.  During peak traffic, I've seen RAIS spike to about 400 
megs of RAM.  With load testing pushing its limits, it can even jump as 
high as a gig before CPU bottlenecks slow the requests down too much.  But 
our rather modest server is still running with the same specs it's had for 
at least five years.  It has better performance than before RAIS, despite 
the fact that we have increased our image count by about 40%, we now 
support color images as part of our "born digital" initiative, and traffic 
has more than doubled.

Go, with its very low-overhead C bindings, gave us a huge win here.  We 
probably could have crafted something in C or C++ with better performance, 
but it would have been such a big undertaking in comparison to Go (even 
though I have some basic C and C++ experience, the syntax and gotchas are 
... let's just say, "tricky") that the project wouldn't have gotten the 
green light, or else would have been scrapped mid-dev.  Go's syntax is 
simple enough that I was able to jump right in and get work done quickly.  
And unlike many languages I use, I can jump from Go to other projects and 
back again without losing very much productivity.  I don't feel like I 
*have* to live in Go in order to keep it in my head.  When I'm in Rails, on 
the other hand... well, if you can't say something nice....

Today the project uses "gb" to avoid vendoring dependencies in the repo (I 
don't know what happens if we license our project as CC0 and then include a 
bunch of others' code, and I don't want to find out), and has a very simple 
docker image available to take it for a test drive quickly (or even use it 
in production, if they have a docker environment already).  It's no longer 
tied to funky commits of openjpeg since that project has since released a 
viable version with the functionality we needed.  And it's (hopefully) a 
lot more idiomatic than when I started.

I have to say, for a first project in a new tech, it's one of very few 
which I don't look at as a disaster.  And probably the only one I see as a 
raging success despite my inexperience.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] [ANN] RAIS: Fully open-source IIIF-compliant image server... and BONUS: the exciting story of how Go made its way into UO Libraries

Reply via email to