Note that access logs may not be viewable without superuser access, or could 
have been automatically transferred to another the server for secure storage 
and analysis, if your system admins felt it necessary to comply with local 
privacy or data protection regulations.


<> on behalf of Steele, 
Henry <>
Sent: 25 March 2020 13:06
To: Archivesspace Users Group <>
Subject: Re: [Archivesspace_Users_Group] how to find if a certain page was 
accessed in the PUI

Thank you, Blake.  This is all really helpful.  I will see if I can use some of 
these strategies to look through our logs.

It may not be there.  Our pui log level is “fatal” and I’m not sure the Apache 
(/var/log/httpd) contain this kind of information.  But it’s certainly good to 
know how to look

Henry Steele

Systems Librarian

Tufts University Library Technology Services


<> On Behalf Of Blake 
Sent: Wednesday, March 25, 2020 9:01 AM
To: Archivesspace Users Group <>
Subject: Re: [Archivesspace_Users_Group] how to find if a certain page was 
accessed in the PUI

I know quite a few people use Google Analytics, which is not something I find 
useful at all, but it's used quite often. Check matamo for an open source 
analytics product. There are many others. I know matamo gives you the ability 
to customize things, and I bet it could be quite useful, though I've not 
touched it in many years.

I think your best bet is to get to know your Apache logs. You should be able to 
get something useful out of there, but you'll need to learn what your logging 
there, and maybe change it  up.  Read up on Apache's "LogFormat" , it's pretty 
flexible and you can customize that on your server. You can also customize 
where log files end up for which domain name, so that might help as well.  If 
you're running the PUI and STAFF sides on different URLs, or prefixes, that 
will help set them apart for logging. This is all one of those "It Depends" 
kinds of things.  Using grep/awk/sed etc... will let you pull out different 
things from the logs. Try tailing the log as you look at different things on 
the site and see how those get logged, then work up some simple greps to pull 
out just what you need every day.  This is a simple one I use to see the 
busiest sites on a server:

cat /var/log/apache2/other_vhosts_access.log.1 |  awk {'print $1'} | sort |uniq 
-c |sort -nr | head -20

(If you're looking at that and thinking "You don't need cat in there, dummy" I 
know I know, old habits die hard)

You could do the same kind of grep work on the archivesspace.out log file and 
get something out of it. You might need to experiment with loglevel on that to 
see what you can get. DEBUG is probably way too much.

Here's some real nginx logs... these are based on real logs with some details 
changed to protect the innocent.

Here's one you might see quite often, if someone is logged into the staff side 
you'll see this POST to check their session: - [25/Mar/2020:12:32:22 +0000] "POST /update_monitor/poll 
HTTP/1.1" 200 4751 ""; 
"lock_version=12&uri=%2Frepositories%2F5%2Fresources%2F134" "Mozilla/5.0 
(Macintosh; Intel Mac OS X 10.10; rv:71.0) Gecko/20100101 Firefox/71.0" "-"

Here's another one, someone is looked at a resource on the staff side: - [25/Mar/2020:12:31:03 +0000] "GET 
HTTP/1.1" 200 9839 ""; "-" "Mozilla/5.0 
(Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/80.0.3987.149 Safari/537.36" "-"

And here's a bot crawling the public side. - [25/Mar/2020:12:29:58 +0000] "GET 
/repositories/2/archival_objects/97930 HTTP/1.1" 200 21473 "-" "-" "Mozilla/5.0 
(compatible; DotBot/1.1;,<>)" "-"

Depending on how you configure your Apache/nginx/whatever logs, those log lines 
will look different and you can log a bunch of different things.

On the ArchivesSpace side (archivesspace/logs/archivesspace.out) the logs can 
look different depending on your log level. Here's one set to debug showing the 
indexer doing some work:

INFO: [collection1] webapp= path=/update params={} 
/repositories/2/archival_objects/33930#pui, ... (25 adds)]} 0 6

Here's one line from me viewing a resource on the staff side, as you can see 
it'll be a bit more challenging to get useful stuff out of this log, but it's 
in there:

[2020-03-25T08:45:29-04:00] INFO: [collection1] webapp= path=/select 
 hits=0 status=0 QTime=61


 on behalf of Steele, Henry 
Sent: Wednesday, March 25, 2020 7:54 AM
To: Archivesspace Users Group 
Subject: [Archivesspace_Users_Group] how to find if a certain page was accessed 
in the PUI

Good morning,

We recently made our PUI open the public and we are trying to find out about 
usage, particularly of a certain page within our repository.  I’m trying to 
figure out if there’s any way to see this in the logs.

I’ve looked in the application log archivesspace.out, but I’m not sure what I’m 
seeing here.  I see records being accessed, with a response of 200, but I don’t 
know if this is the staff interface, the PUI, or if it’s some indexing 
activity.  Is there a way in the application log to see if a certain page has 
been accessed in the PUI?  We have our log level set to “fatal” for the PUI, 
and the “pui_log” is default.  I know should mean the log only reports on 
problematic events, but since I see a lot of activity in the log, I’m wondering 
if this setting doesn’t actually have effect.

Alternately, does anyone know if there might be other server logs that would be 
of use?   I’m looking in the Apache logs at /var/log/httpd but I’m not sure 
which of this logs would contain such information if any.

Any information you had would be of great help.  Thanks

If this isn’t

Henry Steele

Systems Librarian

Tufts University Library Technology Services


Archivesspace_Users_Group mailing list

Reply via email to