This is an automated email from the ASF dual-hosted git repository.

sebb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/whimsy.git


The following commit(s) were added to refs/heads/master by this push:
     new bed98882 Use node scanner to find external refs
bed98882 is described below

commit bed98882a6f19c6e51ed22c0043fbfa198d56621
Author: Sebb <[email protected]>
AuthorDate: Mon May 2 15:11:04 2022 +0100

    Use node scanner to find external refs
---
 tools/site-scan.rb | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/site-scan.rb b/tools/site-scan.rb
index 9859bd7c..c55d99bb 100755
--- a/tools/site-scan.rb
+++ b/tools/site-scan.rb
@@ -148,10 +148,15 @@ def parse(id, site, name)
   data[:image] = ASF::SiteImage.find(id)
 
   # Check for resource loading from non-ASF domains
-  ext_urls  = doc.xpath('//script/@src', '//link/@href', '//img/@src').
-    map(&:content).map {|x| ASFDOMAIN.to_ext_host x}.compact.tally
-  resources = ext_urls.values.sum
-  data[:resources] = "Found #{resources} external resources: #{ext_urls}"
+  cmd = ['node', '/srv/whimsy/tools/scan-page.js', site]
+  out, err, status = Open3.capture3(*cmd)
+  if status.success?
+    ext_urls = out.split("\n").reject {|x| ASFDOMAIN.asfhost? x}.tally
+    resources = ext_urls.values.sum
+    data[:resources] = "Found #{resources} external resources: #{ext_urls}"
+  else
+    data[:resources] = err
+  end
 
   #  TODO: does not find js references such as:
   #  ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';

Reply via email to