Re: [CODE4LIB] twarc and 30-day limitation

Edward Summers Tue, 05 May 2020 05:22:35 -0700

Hi Eric,

Like Francis and Darnelle said, Twitter's primary free search API is limited to 
the last 7 days of activity. The so called "Standard" search API is what twarc 
uses to gather data when you `twarc search …`


However a couple years ago Twitter added the Premium Search API [1] which is a 
hybrid approach that lets you search two endpoints (30 day and full archive), 
and is engineered to move you from collecting data for free to paying Twitter 
as you (inevitably) want to gather more.

From your email it sounds like you want to use the Full Archive endpoint? We 
have had this on the Documenting the Now roadmap to add premium support to 
twarc but haven't quite got around to it yet.

I went ahead and created a GitHub issue for you to track our progress [2]. It 
actually shouldn't be too difficult to add, so if you have a present need let 
us know so we can prioritize it higher.

//Ed

PS. As Francis mentioned twint gets around Twitter's API constraints by 
scraping Twitter's search results web page. Scraping comes with its own set of 
complexities, the biggest one is that Twitter actively work to prevent it, 
which (in my experience) can make twint a bit unpredictable to use at times.

[1] https://developer.twitter.com/en/docs/tweets/search/overview/premium
[2] https://github.com/DocNow/twarc/issues/326

Re: [CODE4LIB] twarc and 30-day limitation

Reply via email to