Hi, I've just turned my cascadia from a thin wrapper around the Go Cascadia package <https://github.com/andybalholm/cascadia>, into a poor man's scrapper tool. Please check it out at https://github.com/suntong/cascadia
Here are some exception: The Go Cascadia package <https://github.com/andybalholm/cascadia> implements CSS selectors for html. This is the command line tool, started as a thin wrapper around that package, but growing into a better tool to test CSS selectors without writing Go code: Block selection mode First, as the single selection mode will output the selection as HTML source, so if you want HTML text instead, then you can make use of the block selection mode. $ echo '<div class="container"><p align="justify"><b>Name: </b>John Doe</p></div>' | tee /tmp/cascadia.xml | cascadia -i -o -c 'div > p' 1 elements for 'div > p':<p align="justify"><b>Name: </b>John Doe</p> $ cat /tmp/cascadia.xml | cascadia -i -o -c 'div' --piece SelText='p' SelText Name: John Doe However, the real power of *block selection mode* resides in its capability of producing tsv/csv tables without any go programming: $ curl --silent https://news.ycombinator.com | cascadia -i -o -c 'tr.athing' -p No=span.rank -p Title='td.title > a' -p Site=span.sitestr No Title Site 1. Onedrive is slow on Linux but fast with a ?Windows? user-agent (2016) microsoft.com 2. Starting today, users of Firefox can also enjoy Netflix on Linux netflix.com 3. Research Debt distill.pub ... 27. USPS Informed Delivery ? Digital Images of Front of Mailpieces usps.com 28. Performance bugs ? the dark matter of programming bugs forwardscattering.org 29. Most items of clothing have complicated international journeys bbc.co.uk 30. High-performance employees need quieter work spaces qz.com It's poor man's scrapper tool if text are the only thing needed. For scrapping beyond text, then just go one step further, to use andrew-d/goscrape <https://github.com/andrew-d/goscrape> (or my goscrape <https://github.com/suntong/goscrape> instead, which has some enhancements to it). Again, if text are the only thing needed, then cascadia might be already enough. Here is how to scrap Hacker News *across several pages*: $ curl --silent https://news.ycombinator.com/news?p=[1-3] | cascadia -i -o -c 'tr.athing' -p No=span.rank -p Title='td.title > a' -p Site=span.sitestr No Title Site 1. Starting today, users of Firefox can also enjoy Netflix on Linux netflix.com 2. Onedrive is slow on Linux but fast with a ?Windows? user-agent (2016) microsoft.com 3. Research Debt distill.pub ... 27. Yes I Still Want to Be Doing This at 56 (2012) thecodist.com 28. Performance bugs ? the dark matter of programming bugs forwardscattering.org 29. USPS Informed Delivery ? Digital Images of Front of Mailpieces usps.com 30. High-performance employees need quieter work spaces qz.com 31. Most items of clothing have complicated international journeys bbc.co.uk 32. Telstra?s Gigabit Class LTE Network cellularinsights.com ... 58. The New Laptop Ban Adds to Travelers' Lack of Privacy and Security eff.org 59. QEMU: user-to-root privesc inside VM via bad translation caching chromium.org 60. Startups that debuted at Y Combinator W17 Demo Day 2 techcrunch.com 61. The Cracking Monolith: Forces That Call for Microservices semaphoreci.com 62. Amsterdam Airport Launches API Platform schiphol.nl ... 88. Founder Stories: Leah Culver of Breaker (YC W17) ycombinator.com 89. Find out what you, or someone on your team, did on the last working day github.com 90. PSD2 ? a directive that will change banking in Europe evry.com -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
