---------- Forwarded message --------- From: 'Livingood, Jason' via National Broadband Mapping Coalition <bbcoalit...@marconisociety.org> Date: Fri, Mar 3, 2023 at 5:40 AM Subject: Inform: M-Labs, ~12% of network identifications incorrect To: National Broadband Mapping Coalition <bbcoalit...@marconisociety.org>
FYI for those folks using data from M-Labs. From: 'Stephen Soltesz' via discuss <disc...@measurementlab.net> Reply-To: Stephen Soltesz <solt...@google.com> Date: Thursday, March 2, 2023 at 15:50 To: discuss <disc...@measurementlab.net> Subject: [EXTERNAL] [M-Lab-Discuss] 8-12% missing or misattributed network annotations between 2020-03-10 and 2023-02-09 This only affects “client.Network” (e.g. ASN and ASName) annotations on M-Lab data collected between 2020-03-10 and 2023-02-09. The “client.Geo” (e.g. Latitude, Longitude, SubDivsion1ISOCode, City and Country) annotations are not affected. We are working to correct these annotations by early April or sooner. Impact The network annotations on all data collected between 2020-03-10 and 2023-02-09 may be incorrect. We estimate ~7-10% are missing and ~1-2% are attributed to an incorrect (larger) network address block. These incorrect annotations were not random and depend on the client IP being annotated. So, if a client IP was annotated incorrectly, it would continue to receive an incorrect annotation. We deployed a fix for new annotations on 2023-02-09. So, all data collected since 2023-02-10 will be correct. We are working on a plan to repair the historical network annotations between 2020-03-10 and 2023-02-09. Unfortunately, until the historical data is reprocessed we will not know precisely which historical annotations are incorrect. We cannot identify present-but-incorrect annotations until we recreate the annotation correctly. For aggregate analysis using the ASNs, you should expect ~1-2% errors. For analysis targeting specific networks and depending on the ASN annotations, the impact is harder to quantify and could be much higher. Context In 2020-03-10, M-Lab introduced a measurement annotation process (uuid-annotator) that runs at measurement-time on nodes rather than during post-processing by the data pipeline. This architectural change decoupled the collection of annotations from the need to archive client IP addresses. However, we recently discovered that the percentage of missing annotations was unexpectedly high, ~10%. After further investigation, we discovered a fundamental bug in the uuid-annotator's network annotations that resulted in both the missing annotations and the potential for misattributed annotations. Based on a prototype reprocessor, we estimate that between 1-2% of annotations are annotated with incorrect ASNs because a shorter network prefix was chosen over a correct longer prefix, e.g. 12.0.0.0/8 vs 12.a.b.0/24. Repair Because the annotation and hopannotation1 datatypes are collected at measurement-time without the client IP, these annotations were originally intended to be created once and loaded directly into BigQuery by the data pipeline. Recreating the annotation was not part of the original design. So, to repair these network annotations we must build a new data processing utility to recreate the annotation archives and reprocess them with the existing pipeline. We estimate this work will take four to six weeks, ideally early April. More information and updates will be added here: m-lab/data-annotations#34: 8-12% Missing or incorrect Network annotations 2020-03-10 to 2023-02-09 Please let us know if you have any questions or concerns. -- You received this message because you are subscribed to the Google Groups "discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscr...@measurementlab.net. To view this discussion on the web visit https://groups.google.com/a/measurementlab.net/d/msgid/discuss/127eeac8-b5f3-4bdb-a462-a22f268c1fa6n%40measurementlab.net. -- You received this message because you are subscribed to the Google Groups "National Broadband Mapping Coalition" group. To unsubscribe from this group and stop receiving emails from it, send an email to bbcoalition+unsubscr...@marconisociety.org. To view this discussion on the web visit https://groups.google.com/a/marconisociety.org/d/msgid/BBCoalition/05C1CD2B-D7AF-4F2B-952B-37D009792530%40cable.comcast.com. -- A pithy note on VOQs vs SQM: https://blog.cerowrt.org/post/juniper/ Dave Täht CEO, TekLibre, LLC _______________________________________________ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat