This is an automated email from the ASF dual-hosted git repository.
sebb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/whimsy.git
The following commit(s) were added to refs/heads/master by this push:
new bed98882 Use node scanner to find external refs
bed98882 is described below
commit bed98882a6f19c6e51ed22c0043fbfa198d56621
Author: Sebb <[email protected]>
AuthorDate: Mon May 2 15:11:04 2022 +0100
Use node scanner to find external refs
---
tools/site-scan.rb | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/tools/site-scan.rb b/tools/site-scan.rb
index 9859bd7c..c55d99bb 100755
--- a/tools/site-scan.rb
+++ b/tools/site-scan.rb
@@ -148,10 +148,15 @@ def parse(id, site, name)
data[:image] = ASF::SiteImage.find(id)
# Check for resource loading from non-ASF domains
- ext_urls = doc.xpath('//script/@src', '//link/@href', '//img/@src').
- map(&:content).map {|x| ASFDOMAIN.to_ext_host x}.compact.tally
- resources = ext_urls.values.sum
- data[:resources] = "Found #{resources} external resources: #{ext_urls}"
+ cmd = ['node', '/srv/whimsy/tools/scan-page.js', site]
+ out, err, status = Open3.capture3(*cmd)
+ if status.success?
+ ext_urls = out.split("\n").reject {|x| ASFDOMAIN.asfhost? x}.tally
+ resources = ext_urls.values.sum
+ data[:resources] = "Found #{resources} external resources: #{ext_urls}"
+ else
+ data[:resources] = err
+ end
# TODO: does not find js references such as:
# ga.src = ('https:' == document.location.protocol ? 'https://ssl' :
'http://www') + '.google-analytics.com/ga.js';