Henry,

The server logs will give you one perspective, but another perspective that you 
can possibly get from a page-tagging approach like Google Analytics would be 
all of the other stuff that you an configure (e.g. the ability to add tags with 
Google Tag Manager), which I only bring up since it sounds like you also have 
Google Analytics installed on your site.

When looking at site traffic of an archival discovery system, I would want to 
report on the overall use of an entire finding aid, and with the ASpace PUI 
that would include aggregating all of the visits to the "resource", 
"archival_object", and "top_container" URLs for the same finding aid.  e.g.

http://test.archivesspace.org/resources/historic_ships_records
http://test.archivesspace.org/archival_objects/water_witch_ship_1<http://test.archivesspace.org//archival_objects/water_witch_ship_1>
http://test.archivesspace.org/repositories/2/top_containers/373

All three of those URLs represent different views of 
http://test.archivesspace.org/repositories/2/resources/34, but those three URLs 
above don't tell you that by themselves, which makes setting things up 
especially important.  That's one place where Google Tag Manger could come in 
handy, I think, since you could add a custom dimensions for collection ID, 
finding aid author, etc.  I say "I think" since I haven't used Google Analytics 
as a site manager in a long time, but I would love to have that type of data.

Blake, regarding Matamo, I've often thought that it might be really beneficial 
if an organization provided "Analytics as a Service" alongside site hosting 
(e.g. using the Matomo On-premise option with ASpace) ??.

Mark


________________________________
From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
<archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of Blake 
Carver <blake.car...@lyrasis.org>
Sent: Wednesday, March 25, 2020 9:10 AM
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] how to find if a certain page was 
accessed in the PUI

You (or your friendly neighborhood sysadmin) can change up the Apache logs to 
add/remove quite a few things.  There's all sorts of good stuff you can get in 
there:
http://httpd.apache.org/docs/current/mod/mod_log_config.html<https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhttpd.apache.org%2Fdocs%2Fcurrent%2Fmod%2Fmod_log_config.html&data=02%7C01%7Cmark.custer%40yale.edu%7C78c45215e6704340707908d7d0bdf293%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637207386549151937&sdata=kKkfnH7HwToubDriPmwUY%2F%2BrBOWRM1qvbx5%2FrSGMt5Y%3D&reserved=0>
________________________________
From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
<archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of Steele, 
Henry <henry.ste...@tufts.edu>
Sent: Wednesday, March 25, 2020 9:06 AM
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] how to find if a certain page was 
accessed in the PUI


Thank you, Blake.  This is all really helpful.  I will see if I can use some of 
these strategies to look through our logs.



It may not be there.  Our pui log level is “fatal” and I’m not sure the Apache 
(/var/log/httpd) contain this kind of information.  But it’s certainly good to 
know how to look



Henry Steele

Systems Librarian

Tufts University Library Technology Services

(617)627-5239



From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
<archivesspace_users_group-boun...@lyralists.lyrasis.org> On Behalf Of Blake 
Carver
Sent: Wednesday, March 25, 2020 9:01 AM
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] how to find if a certain page was 
accessed in the PUI



I know quite a few people use Google Analytics, which is not something I find 
useful at all, but it's used quite often. Check matamo for an open source 
analytics product. There are many others. I know matamo gives you the ability 
to customize things, and I bet it could be quite useful, though I've not 
touched it in many years.



I think your best bet is to get to know your Apache logs. You should be able to 
get something useful out of there, but you'll need to learn what your logging 
there, and maybe change it  up.  Read up on Apache's "LogFormat" , it's pretty 
flexible and you can customize that on your server. You can also customize 
where log files end up for which domain name, so that might help as well.  If 
you're running the PUI and STAFF sides on different URLs, or prefixes, that 
will help set them apart for logging. This is all one of those "It Depends" 
kinds of things.  Using grep/awk/sed etc... will let you pull out different 
things from the logs. Try tailing the log as you look at different things on 
the site and see how those get logged, then work up some simple greps to pull 
out just what you need every day.  This is a simple one I use to see the 
busiest sites on a server:



cat /var/log/apache2/other_vhosts_access.log.1 |  awk {'print $1'} | sort |uniq 
-c |sort -nr | head -20

(If you're looking at that and thinking "You don't need cat in there, dummy" I 
know I know, old habits die hard)



You could do the same kind of grep work on the archivesspace.out log file and 
get something out of it. You might need to experiment with loglevel on that to 
see what you can get. DEBUG is probably way too much.



Here's some real nginx logs... these are based on real logs with some details 
changed to protect the innocent.



Here's one you might see quite often, if someone is logged into the staff side 
you'll see this POST to check their session:

4.4.4.4 example.edu - [25/Mar/2020:12:32:22 +0000] "POST /update_monitor/poll 
HTTP/1.1" 200 4751 
"https://example.edu/resources/134/edit<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fexample.edu%2Fresources%2F134%2Fedit&data=02%7C01%7Cmark.custer%40yale.edu%7C78c45215e6704340707908d7d0bdf293%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637207386549161930&sdata=2Eji8WQW742DE1QRdrzmLCkQ%2BtNXzymZu9vEPC3wHa4%3D&reserved=0>"
 "lock_version=12&uri=%2Frepositories%2F5%2Fresources%2F134" "Mozilla/5.0 
(Macintosh; Intel Mac OS X 10.10; rv:71.0) Gecko/20100101 Firefox/71.0" "-"



Here's another one, someone is looked at a resource on the staff side:

4.4.4.4 example.edu - [25/Mar/2020:12:31:03 +0000] "GET 
/resources/2774?inline=true&undefined_id=%2Frepositories%2F3%2Fresources%2F2774 
HTTP/1.1" 200 9839 
"https://example.edu/resources/2774<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fexample.edu%2Fresources%2F2774&data=02%7C01%7Cmark.custer%40yale.edu%7C78c45215e6704340707908d7d0bdf293%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637207386549171927&sdata=Ugpn7089njik19GBuKPStFoahF5Q1kmRsJKVJFai8sU%3D&reserved=0>"
 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/80.0.3987.149 Safari/537.36" "-"



And here's a bot crawling the public side.

216.244.66.240 example.edu - [25/Mar/2020:12:29:58 +0000] "GET 
/repositories/2/archival_objects/97930 HTTP/1.1" 200 21473 "-" "-" "Mozilla/5.0 
(compatible; DotBot/1.1; 
http://www.opensiteexplorer.org/dotbot<https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.opensiteexplorer.org%2Fdotbot&data=02%7C01%7Cmark.custer%40yale.edu%7C78c45215e6704340707908d7d0bdf293%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637207386549171927&sdata=z84CL3eKCnhPwGMsWYhRN65e2L1UTTDGZEcs8ehFLqQ%3D&reserved=0>,
 h...@moz.com<mailto:h...@moz.com>)" "-"





Depending on how you configure your Apache/nginx/whatever logs, those log lines 
will look different and you can log a bunch of different things.



On the ArchivesSpace side (archivesspace/logs/archivesspace.out) the logs can 
look different depending on your log level. Here's one set to debug showing the 
indexer doing some work:



INFO: [collection1] webapp= path=/update params={} 
{add=[/repositories/2/archival_objects/33921#pui, 
/repositories/2/archival_objects/33922#pui, 
/repositories/2/archival_objects/33923#pui, 
/repositories/2/archival_objects/33924#pui, 
/repositories/2/archival_objects/33925#pui, 
/repositories/2/archival_objects/33926#pui, 
/repositories/2/archival_objects/33927#pui, 
/repositories/2/archival_objects/33928#pui, 
/repositories/2/archival_objects/33929#pui, 
/repositories/2/archival_objects/33930#pui, ... (25 adds)]} 0 6



Here's one line from me viewing a resource on the staff side, as you can see 
it'll be a bit more challenging to get useful stuff out of this log, but it's 
in there:



[2020-03-25T08:45:29-04:00] INFO: [collection1] webapp= path=/select 
params={facet.field=assessment_record_types&facet.field=assessment_surveyors&facet.field=assessment_review_required&facet.field=assessment_reviewers&facet.field=assessment_completed&facet.field=assessment_inactive&facet.field=assessment_survey_year&facet.field=assessment_sensitive_material&csv.escape=\&start=0&q.op=AND&fq=repository:"/repositories/3"+OR+repository:global&fq=types:("assessment")&fq=(-types:("pui_only")+AND+(assessment_record_uris:("\/repositories\/3\/resources\/406")))&fq=-exclude_by_default:true&sort=&rows=30&bq=primary_type:resource^100&q=*:*&facet.limit=20&defType=edismax&qf=four_part_id^3+title^2+finding_aid_filing_title^2+fullrecord&pf=four_part_id^4&csv.header=true&csv.encapsulator="&facet.mincount=0&wt=json&facet=true}
 hits=0 status=0 QTime=61



________________________________

From: 
archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 
<archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of Steele, Henry 
<henry.ste...@tufts.edu<mailto:henry.ste...@tufts.edu>>
Sent: Wednesday, March 25, 2020 7:54 AM
To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] how to find if a certain page was accessed 
in the PUI



Good morning,



We recently made our PUI open the public and we are trying to find out about 
usage, particularly of a certain page within our repository.  I’m trying to 
figure out if there’s any way to see this in the logs.



I’ve looked in the application log archivesspace.out, but I’m not sure what I’m 
seeing here.  I see records being accessed, with a response of 200, but I don’t 
know if this is the staff interface, the PUI, or if it’s some indexing 
activity.  Is there a way in the application log to see if a certain page has 
been accessed in the PUI?  We have our log level set to “fatal” for the PUI, 
and the “pui_log” is default.  I know should mean the log only reports on 
problematic events, but since I see a lot of activity in the log, I’m wondering 
if this setting doesn’t actually have effect.



Alternately, does anyone know if there might be other server logs that would be 
of use?   I’m looking in the Apache logs at /var/log/httpd but I’m not sure 
which of this logs would contain such information if any.



Any information you had would be of great help.  Thanks



If this isn’t



Henry Steele

Systems Librarian

Tufts University Library Technology Services

(617)627-5239


_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

Reply via email to