Date: 03 Oct 2022
Module : Scrapy
Installation : pip install Scrapy
About:
Scrapy is a fast high-level web crawling and web scraping framework, used
to crawl websites and extract structured data from their pages. It can be
used for a wide range of purposes, from data mining to monitoring and
automated testing.
Sample:
import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
name = "toscrape-css"
start_urls = [
'http://quotes.toscrape.com/',
]
def parse(self, response):
for quote in response.css("div.quote"):
yield {
'text':
quote.css("span.text::text").extract_first(),
'author':
quote.css("small.author::text").extract_first(),
'tags': quote.css("div.tags >
a.tag::text").extract()
}
next_page_url = response.css("li.next >
a::attr(href)").extract_first()
if next_page_url is not None:
yield scrapy.Request(response.urljoin(next_page_url))
Execution:
scrapy runspider scrape_sample.py -o quotes.json
Reference:
https://pypi.org/project/Scrapy/
_______________________________________________
Chennaipy mailing list
[email protected]
https://mail.python.org/mailman/listinfo/chennaipy